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ARRAYS OF NUCLEIC ACID PROBES ON BIOLOGICAL CHIPS 

Cross-Reference to Related Application 
This application is a continuation-in-part of USSN 
08/284 , 064, filed August 2, 1994, which is a continuation-in- 
part of USSN 08/143,312, filed October 26, 1993, each of which 
is incorporated by reference in its entirety for all purposes. 
Research leading to the invention was funded in part by NIH 
grant No. 1R01HG00813-01 , and the government may have certain 
rights to the invention. 

Background of the Invention 
Field of the Invention 

The present invention provides arrays of oligonucleotide 
probes immobilized in microf abricated patterns on silica chips 
for analyzing molecular interactions of biological interest. 
The invention therefore relates to diverse fields impacted by 
the nature of molecular interaction, including chemistry, 
biology, medicine, and medical diagnostics. 

Description of Related Art 

Oligonucleotide probes have long been used to detect 
complementary nucleic acid sequences in a nucleic acid of 
interest (the "target" nucleic acid) . In some assay formats, 
the oligonucleotide probe is tethered, i.e., by covalent 
attachment, to a solid support, and arrays of oligonucleotide 
probes immobilized on solid supports have been used to detect 
specific nucleic acid sequences in a target nucleic acid. 
See, e.g., PCT patent publication Nos. WO 89/10977 and 
89/11548. Others have proposed the use of large numbers of 
oligonucleotide probes to provide the complete nucleic acid 
sequence of a target nucleic acid but failed to provide an 
enabling method for using arrays of immobilized probes for 
this purpose. See U.S. Patent Nos. 5,202,231 and 5,002,867 
and PCT patent publication No. WO 93/17126. 
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The development of VLSIPS™ technology has provided 
methods for making very large arrays of oligonucleotide probes 
in very small arrays. See U.S. Patent No. 5,143,854 and PCT 
patent publication Nos. WO 90/15070 and 92/10092, each of 
5 which is incorporated herein by reference. U.S. Patent 

application Serial No. 082,937, filed June 25, 1993, describes 
methods for making arrays of oligonucleotide probes that can 
be used to provide the complete sequence of a target nucleic 
acid and to detect the presence of a nucleic acid containing a 

10 specific nucleotide sequence. 

Microf abricated arrays of large numbers of 
oligonucleotide probes, called "DNA chips" offer great promise 
for a wide variety of applications. New methods and reagents 
are required to realize this promise, and the present 

15 invention helps meet that need. 

SUMMARY OF THE INVENTION 
The invention provides several strategies employing 
immobilized arrays of probes for comparing a reference 
sequence of known sequence with a target sequence showing 

20 substantial similarity with the reference sequence, but 

differing in the presence of, e.g., mutations. In a first 
embodiment, the invention provides a tiling strategy employing 
an array of immobilized oligonucleotide probes comprising at 
least two sets of probes. A first probe set comprises a 

25 plurality of probes, each probe comprising a segment of at 

least three nucleotides exactly complementary to a subsequence 
of the reference sequence, the segment including at least one 
interrogation position complementary to a corresponding 
nucleotide in the reference sequence. A second probe set 

3 0 comprises a corresponding probe for each probe in the first 
probe set, the corresponding probe in the second probe set 
being identical to a sequence comprising the corresponding 
probe from the first, probe set or a subsequence of at least 
three nucleotides thereof that includes the at least one 

35 interrogation position, except that the at least one 

interrogation position is occupied by a different nucleotide 
in each of the two corresponding probes from the first and 
second probe sets. The probes in the first probe set have at 
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least two interrogation positions corresponding to two 
contiguous nucleotides in the reference sequence. one 
interrogation position corresponds to one of the contiguous 
nucleotides, and the other interrogation position to the 
other. 

In a second embodiment, the invention provides a tiling 
strategy employing an array comprising four probe sets. A 
first probe set comprises a plurality of probes, each probe 
comprising a segment of at least three nucleotides exactly 
complementary to a subsequence of the reference sequence, the 
segment including at least one interrogation position 
complementary to a corresponding nucleotide in the reference 
sequence. Second, third and fourth probe sets each comprise a 
corresponding probe for each probe in the first probe set. 
The probes in the second, third and fourth probe sets are 
identical to a sequence comprising the corresponding probe 
from the first probe set or a subsequence of at least three 
nucleotides thereof that includes the at least one 
interrogation position, except that the at least one 
interrogation position is occupied by a different nucleotide 
in each of the four corresponding probes from the four probe 
sets. The first probe set often has at least 100 
interrogation positions corresponding to 100 contiguous 
nucleotides in the reference sequence. Sometimes the first 
probe set has an interrogation position corresponding to every 
nucleotide in the reference sequence. The segment of 
complementarity within the probe set is usually about 9-21 
nucleotides. Although probes may contain leading or trailing 
sequences in addition to the 9-21 sequences, many probes 
consist exclusively of a 9-21 segment of complementarity. 

In a third embodiment, the invention provides immobilized 
arrays of probes tiled for multiple reference sequences. One 
such array comprises at least one pair of first and second 
probe groups, each group comprising first and second sets of 
probes as defined in the first embodiment. Each probe in the 
first probe set from the first group is exactly complementary 
to a subsequence of a first reference sequence, and each probe 
in the first probe set from the second group is exactly 
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complementary to a subsequence of a second reference sequence. 
Thus, the first group of probes are tiled with respect to a 
first reference sequence and the second group of probes with 
respect to a second reference sequence- Each group of probes 
5 can also include third and fourth sets of probes as defined in 
the second embodiment. In some arrays of this type, the 
second reference sequence is a mutated form of the first 
reference sequence. 

In a fourth embodiment, the invention provides arrays for 

10 block tiling. Block tiling is a species of the general tiling 
strategies described above. The usual unit of a block tiling 
array is a group of probes comprising a wildtype probe, a 
first set of three mutant probes and a second set of three 
mutant probes. The wildtype probe comprises a segment of at 

15 least three nucleotides exactly complementary to a subsequence 

of a reference s equence . The segment has at lea s t first and 

second interrogation positions corresponding to first and 
second nucleotides in the reference sequence. The probes in 
the first set of three mutant probes are each identical to a 

2 0 sequence comprising the wildtype probe or a subsequence of at 

least three nucleotides thereof including the first and second 
interrogation positions, except in the first interrogation 
position, which is occupied by a different nucleotide in each 
of the three mutant probes and the wildtype probe. The probes 
25 in the second set of three mutant probes are each identical to 
a sequence comprising the wildtype probes or a subsequence of 
at least three nucleotides thereof including the first and 
second interrogation positions, except in the second 
interrogation position, which is occupied by a different 

3 0 nucleotide in each of the three mutant probes and the wildtype 

probe . 

In a fifth embodiment, the invention provides methods of 
comparing a target sequence with a reference sequence using 
arrays of immobilized pooled probes. The arrays employed in 
3 5 these methods represent a further species of the general 

tiling arrays noted above. In these methods, variants of a 
reference sequence differing from the reference sequence in at 
least one nucleotide are identified and each is assigned a 
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designation. An array of pooled probes is provided, with each 
pool occupying a separate cell of the array. Each pool 
comprises a probe comprising a segment exactly complementary 
to each variant sequence assigned a particular designation. 
The array is then contacted with a target sequence comprising 
a variant of the reference sequence. The relative 
hybridization intensities of the pools in the array to the 
target sequence are determined. The identity of the target 
sequence is deduced from the pattern of hybridization 
intensities. Often, each variant is assigned a designation 
having at least one digit and at least one value for the 
digit. In this case, each pool comprises a probe comprising a 
segment exactly complementary to each variant sequence 
assigned a particular value in a particular digit. When 
variants are assigned successive numbers in a numbering system 
of base m having n digits, n x (m-l> pooled probes are used 
are used to assign each variant a designation. 

In a sixth embodiment, the invention provides a pooled 
probe for trellis tiling, a further species of the general 
tiling strategy. In trellis tiling, the identity of a 
nucleotide in a target sequence is determined from a 
comparison of hybridization intensities of three pooled 
trellis probes. A pooled trellis probe comprises a segment 
exactly complementary to a subsequence of a reference sequence 
except at a first interrogation position occupied by a pooled 
nucleotide N, a second interrogation position occupied by. a 
pooled nucleotide selected from the group of three consisting 
of (1) M or K, (2) R or Y and (3) S or W, and a third 
interrogation position occupied by a second pooled nucleotide 
selected from the group. The pooled nucleotide occuqvj.no^ the. 
second interrogation position comprises a nucleotide 
complementary to a corresponding nucleotide from the reference 
sequence when the second pooled probe and reference sequence 
are maximally aligned, and the pooled nucleotide occupying the 
third interrogation position comprises a nucleotide 
complementary to a corresponding nucleotide from the reference 
sequence when the third pooled probe and the reference 
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sequence are maximally aligned. Standard IUPAC nomenclature 
is used for describing pooled nucleotides. 

In trellis tiling, an array comprises at least first, 
second and third cells, respectively occupied by first, second 
5 and third pooled probes, each according to the generic 

description above. However, the segment of complementarity, 
location of interrogation positions, and selection of pooled 
nucleotide at each interrogation position may or may not 
differ between the three pooled probes subject to the 

10 following constraint. One of the three interrogation 

positions in each of the three pooled probes must align with 
the same corresponding nucleotide in the reference sequence. 
This interrogation position must be occupied by a N in one of 
the pooled probes, and a different pooled nucleotide in each 

15 of the other two pooled probes. 

In a seventh embodiment, — the invenLiuii piuvides aucays 

for bridge tiling . Bridge tiling is a species of the general 
tiling strategies noted above, in which probes from the first 
probe set contain more than one segment of complementarity. 

2 0 In bridge tiling, a nucleotide in a reference sequence is 

usually determined from a comparison of four probes. A first 
probe comprises at least first and second segments, each of at 
least three nucleotides and each exactly complementary to 
first and second subsequences of a reference sequences. The 
25 segments including at least one interrogation position 

corresponding to a nucleotide in the reference sequence. 
Either (1) the first and second subsequences are noncontiguous 
in the reference sequence, or (2) the first and second 
subsequences are contiguous and the first and second segments 

3 0 are inverted relative to the first and second subsequences. 

The arrays further comprises second, third and fourth probes, 
which are identical to a sequence comprising the first probe 
or a subsequence thereof comprising at least three nucleotides 
from each of the first and second segments, except in the at 
35 least one interrogation position, which differs in each of the 
probes. In a species of bridge tiling, referred to as 
deletion tiling, the first and second subsequences are 
separated by one or two nucleotides in the reference sequence. 
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In an eighth embodiment, the invention provides arrays of 
probes for multiplex tiling. Multiplex tiling is a strategy, 
in which the identity of two nucleotides in a target sequence 
is determined from a comparison of the hybridization 
intensities of four probes, each having two interrogation 
positions. Each of the probes comprising a segment of at 
least 7 nucleotides that is exactly complementary to a 
subsequence from a reference sequence, except that the segment 
may or may not be exactly complementary at two interrogation 
positions* The nucleotides occupying the interrogation 
positions are selected by the following rules: (l) the first 
interrogation position is occupied by a different nucleotide 
in each of the four probes, (2) the second interrogation 
position is occupied by a different nucleotide in each of the 
four probes, (3) in first and second probes, the segment is 
exactly complementary to the subsequence, except at no more 
than one of the interrogation positions, (4) in third and 
fourth probes, the segment is exactly complementary to the 
subsequence, except at both of the interrogation positions. 

In a ninth embodiment, the invention provides arrays of 
immobilized probes including helper mutations. Helper 
mutations are useful for, e.g., preventing self -annealing of 
probes having inverted repeats. In this strategy, the 
identity of a nucleotide in a target sequence is usually 
determined from a comparison of four probes. a first probe 
comprises a segment of at least 7 nucleotides exactly 
complementary to a subsequence of a reference sequence except 
at one or two positions, the segment including an 
interrogation position not at the one or two positions. The 
one or two positions are occupied by helper mutations. 
Second, third and fourth mutant probes are each identical to a 
sequence comprising the wildtype probe or a subsequence 
thereof including the interrogation position and the one or 
two positions, except in the interrogation position, which is 
occupied by a different nucleotide in each of the four probes. 

In a tenth embodiment, the invention provides arrays of 
probes comprising at least two probe sets, but lacking a probe 
set comprising probes that are perfectly matched to a 
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reference sequence. Such arrays are usually employed in 
methods in which both reference and target sequence are 
hybridized to the array. The first probe set comprising a 
plurality of probes, each probe comprising a segment exactly 
5 complementary to a subsequence of at least 3 nucleotides of a 
reference sequence except at an interrogation position. The 
second probe set comprises a corresponding probe for each 
probe in the first probe set, the corresponding probe in the 
second probe set being identical to a sequence comprising the 

10 corresponding probe from the first probe set or a subsequence 
of at least three nucleotides thereof that includes the 
interrogation position, except that the interrogation position 
is occupied by a different nucleotide in each of the two 
corresponding probes and the complement to the reference 

15 sequence . 

In an eleventh embodiment/ the. invention pr o vides metho ds 

of comparing a target sequence with a reference sequence 
comprising a predetermined sequence of nucleotides using any 
of the arrays described above. The methods comprise 

20 hybridizing the target nucleic acid to an array and 

determining which probes, relative to one another, in the 
array bind specifically to the target nucleic acid. The 
relative specific binding of the probes indicates whether the 
target sequence is the same or different from the reference 

25 sequence. In some such methods, the target sequence has a 

substituted nucleotide relative to the reference sequence in 
at least one undetermined position, and the relative specific 
binding of the probes indicates the location of the position 
and the nucleotide occupying the position in the target 

3 0 sequence. In some methods, a second target nucleic acid is 
also hybridized to the array. The relative specific binding 
of the probes then indicates both whether the target sequence 
is the same or different from the reference sequence, and 
whether the second target sequence is the same or different 

35 from the reference sequence. In some methods, when the array 
comprises two groups of probes tiled for first and second 
reference sequences, respectively, the relative specific 
binding of probes in the first group indicates whether the 
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target sequence is the same or different from the first 
reference sequence. The relative specific binding of probes 
in the second group indicates whether the target sequence is 
the same or different from the second reference sequence. 
Such methods are particularly useful for analyzing 
heterologous alleles of a gene. Some methods entail 
hybridizing both a reference sequence and a target sequence to 
any of the arrays of probes described above. Comparison of 
the relative specific binding of the probes to the reference 
and target sequences indicates whether the target sequence is 
the same or different from the reference sequence. 

In a twelfth embodiment, the invention provides arrays of 
immobilized probes in which the probes are designed to tile a 
reference sequence from a human immunodeficiency, virus_.. 
Reference sequences from either the reverse transcriptase gene 
or protease gene of HIV are of particular interest. Some 
chips further comprise arrays of probes tiling a reference 
sequence from a 16S RNA or DNA encoding the 16S RNA from a 
pathogenic microorganism. The invention further provides 
methods of using such arrays in analyzing a HIV target 
sequence. The methods are particularly useful where the 
target sequence has a substituted nucleotide relative to the 
reference sequence in at least one position, the substitution 
conferring resistance to a drug use in treating a patient 
infected with a HIV virus. The methods reveal the existence 
of the substituted nucleotide. The methods are also 
particularly useful for analyzing a mixture of undetermined 
proportions of first and second target sequences from 
different HIV variants. The relative specific binding of 
probes indicates the proportions of the first and second 
target sequences . 

In a thirteenth embodiment, the invention provides arrays 
of probes tiled based on reference sequence from a CFTR gene. 
A preferred array comprises at least a group of probes 
comprising a wildtype probe, and five sets of three mutant 
probes. The wildtype probe is exactly complementary to a 
subsequence of a reference sequence from a cystic fibrosis 
gene, the segment having at least five interrogation positions 
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corresponding to five contiguous nucleotides in the reference 
sequence. The probes in the first set of three mutant probes 
are each identical to the wildtype probe, except in a first of 
the five interrogation positions, which is occupied by a 
5 different nucleotide in each of the three mutant probes and 
the wildtype probe* The probes in the second set of three 
mutant probes are each identical to the wildtype probe, except 
in a second of the five interrogation positions, which is 
occupied by a different nucleotide in each of the three mutant 

10 probes and the wildtype probe. The probes in the third set of 
three mutant probes are each identical to the wildtype probe, 
except in a third of the five interrogation positions, which 
is occupied by a different nucleotide in each of the three 
mutant probes and the wildtype probe. The probes in the 

15 fourth set of three mutant probes are each identical to the 
wildtype probe, except in a fourth of the five interrogation 
positions, which is occupied by a different nucleotide in each 
of the three mutant probes and the wildtype probe. The probes 
in the fifth set of three mutant probes are each identical to 

2 0 the wildtype probe, except in a fifth of the five 

interrogation positions, which is occupied by a different 
nucleotide in each of the three mutant probes and the wildtype 
probe. Preferably, a chip comprises two such groups of 
probes. The first group comprises a wildtype probe exactly 

25 complementary to a first reference sequence, and the second 
group comprises a wildtype probe exactly complementary to a 
second reference sequence that is a mutated form of the first 
reference sequence . 

The invention further provides methods of using the 

30 arrays of the invention for analyzing target sequences from a 
CFTR gene. The methods are capable of simultaneously 
analyzing first and second target sequences representing 
heterozygous alleles of a CFTR gene. 

In a fourteenth embodiment, the invention provides arrays 

35 of probes tiling a reference sequence from a p53 gene, an 
hMLHl gene and/ or an MSH2 gene. The invention further 
provides methods of using the arrays described above to 
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Fig 7: Block tiling strategy. The probe from the first 
probe set has three interrogation positions. The probes from 
the other probe sets have only one of these interrogation 
positions. 

5 Fig. 8: Multiplex tiling strategy. Each probe has two 

interrogation positions. 

Fig 9. Helper mutation strategy. The segment of 
complementarity differs from the complement of the reference 
sequence at a helper mutation as well as the interrogation 

10 position. 

Fig 10 Layout of probes on the HV 407 chip. The figure 
shows successive rows of sequence each of which is subdivided 
into four lanes. The four lanes correspond to the A-, C-, G 
and T-lanes on the chip. Each probe is represented by the 
15 nucleotide occupying its interrogation position. The letter 

„ N „ indicates a control piube or empty column The different 

sized-probes are laid out in parallel. That is, from top-to- 
bottom, a row of 13 mers is followed by a row of 15 mers, 
which is followed by a row of 17 mers, which is followed by a 

20 row of 19 mers. 

Fig 11 Fluorescence pattern of HV 407 hybridized to a 
target sequence ( P Poll9) identical to the chips reference 
sequence. 

Fig 12 Sequence read from HV 407 chip hybridized to 
25 pPoll9 and 4MUT18 (separate experiments) . The reference 
sequence is designated "wildtype." Beneath the reference 
sequence are four rows of sequence read from the chip 
hybridized to the P Poll9 target, the first row being read from 
13 mers, the second row from 15 mers, the third row from 17 
30 mers and the fourth row from 19 mers. Beneath these 

sequences, there are four further rows of sequence read from 
the chip hybridized to the HXB2 target. Successive rows are 
read from 13 mers, 15 mers, 17 mers and 19 mers. Each 
nucleotide in a row is called from the relative fluorescence 
35 intensities of probes in A-, C-, G- and T-lanes. Regions of 
ambiguous sequence read from the chip are highlighted. The 
strain differences between the HBX2 sequence and the . reference 
sequence that were correctly detected are indicated (*) , and 
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those that could not be called are indicated (o) . (The 
nucleotide at position 417 was read correctly in some 
experiments) . The location of some stations known to be 
assocxated with drug resistance that occur in readable regions 
of th. chap are shown above (codon number, and below (mutant 
nucleot.de) the sequence designated "wildtype,. The J", 

" amPlifY target 5£ ~ - indicated ^ 



Fig. 13; Detection of mixort i-=*>-^«^ 
^ ^ x mixea target sequences. The 

™ « rget differs froB the wilatype 

" C ! d0n 67 °' the — transcriptase gene . Lh air^ent 
™- 9r«p of probes has a coluan of f our prohes for read", 
the nudeotxae in which the station occurs. The four probe! 
=ccup ylng , coluan are representea by a 

f, 9 ure „th the sytool (0 , i„aicatin g the interrogation 
= o„. which is occupled by . di£f _ nt nuciao ^ ±ae ^ ^ 

^ airrerent proportions of mutant and 

wildtype target. The fluorescence intensities are fro, probes 

"hich n9 th inter r gation positions for reading the -ci eotid r at 

whxch the mutant and wildtype targets diverge. 

Fig. is: Sequence read from protease chip from four 
clxnical samples before and after treatment with ddl> 

CFTR IoL 16 V ! l0Ck t±ling ^ ° f Pr ° beS f ° r ^ lyzing a 
CFTR point mutation. Each probe show actually represent! four 
Probes, with one probe having each of A, c, G or T at 
interrogation position N. In the order shown, the first probe 
show n on the left is tiled from the wildtype reference " 
sequence, the second probe from the mutant sequence and S o n 
in alternating fashion. Kote that all of the proL s are 
identical except at the interrogation position, which shifts 
one Pos ltl on between successive probes tiled from the same 
reference sequence (e.g., the first, third and fifth probes in 
the left hand column.) The grid shows the hybridization 
intensities when the array is hybridized to the reference 
sequence. ence 
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Fig. 17: Hybridization pattern for heterozygous target. 
The figure shows the hybridization pattern when the array of 
the previous figure is hybridized to a mixture of mutant and 
wildtype reference sequences. 
5 Fig. 18, in panels A, B, and C, shows an image made from 

the region of a DNA chip containing CFTR exon 10 probes; in 
panel A, the chip was hybridized to a wild-type target; in 
panel C, the chip was hybridized to a mutant AF508 target; and 
in panel B, the chip was hybridized to a mixture of the 

10 wild-type and mutant targets. 

Fig. 19, in sheets 1-3, corresponding to panels A, B, 
and C of Fig. 18, shows graphs of fluorescence intensity 
versus tiling position. The labels on the horizontal axis 
show the bases in the wild- type sequence corresponding to the 

15 position of substitution in the respective probes. Plotted 
are the intensities observed from the features (or synthesis 
sites) containing wild-type probes, the features containing 
the substitution probes that bound the most target ("called"), 
and the feature containing the substitution probes that bound 

2 0 the target with the second highest intensity of all the 
substitution probes ("2nd Highest") . 

Fig. 20, in panels A, B, and C, shows an image made from 
a region of a DNA chip containing CFTR exon 10 probes; in 
panel A, the chip was hybridized to the wt480 target; in panel 

25 C, the chip was hybridized to the mu480 target; and in panel 
B, the chip was hybridized to a mixture of the wild-type and 
mutant targets. 

Fig. 21, in sheets 1-3, corresponding to panels A, B, 
and C of Fig. 20, shows graphs of fluorescence intensity 

30 versus tiling position. The labels on the horizontal axis 

show the bases in the wild-type sequence corresponding to the 
position of substitution in the respective probes. Plotted 
are the intensities observed from the features (or synthesis 
sites) containing wild-type probes, the features containing 

35 the substitution probes that bound the most target ("called"), 
and the feature containing the substitution probes^ that bound 
the target with the second highest intensity of all the 
substitution probes ("2nd Highest") . 
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Fig. 22, in panels A and B, shows an image made from a 
regxon of a DNA chip containing CFTR exon io probes; in panel 
A, the chip was hybridized to nucleic acid derived from the 
genomic DNA of an individual with wild-type AF508 sequences- 
in panel B, the target nucleic acid originated from a 
heterozygous (with respect to the AF508 mutation) individual 

Fig. 23, xn sheets 1 and 2 , corresponding to panels A and 
° f Flg - 22 ' shows * ra P hs ° f fluorescence intensity versus 
tilxng position. The labels on the horizontal axis show the 
bases in the wild-type sequence corresponding to the position 
of substitution in the respective probes. Plotted are the 
intensities observed from the features (or synthesis sites) 
containing wild-type probes, the features containing the 
substitution probes that bound the most target ('.called") and 
the feature containing the substitution probes that bound'the 
target with the second highest intensity of all the 
substitution probes ("2nd Highest"). 

^ Fig. 24: Hybridization of homozygous wildtype (A) and 
heterozygous ( B ) target sequences from exon li of the CFTR 
gene to a block tiling array designed to detect G551D and 
Q552X mutations in CFTR gene. 

Fig. 25: Hybridization of homozygous wildtype (A) and 
AF508 mutant (B) target sequences from exon 10 of the CFTR 
gene to a block tiling array designed to detect mutations 
AF508, AI507 and F508C. 

Fig. 26: Hybridization of heterozygous mutant target 
sequences, AF508/F508C, to the array of Fig. 25. 

Fig. 27 shows the alignment of some of the probes on a 
P53 DNA chip with a 12-mer model target nucleic acid 

chip/' 9 * ^ Sh ° WS 3 ° f 10 " mer Pr ° beS f ° r 3 P " exon 6 DN * 

Fig. 29 shows that very distinct patterns are observed 
after hybridization of p 5 3 DNA chips with targets having 
different i base substitutions. m the first image in Fig 
29, the 12-mer probes that form perfect matches with the 
wild-type target are in the first row (top) . The 12-mer 
Probes with single base mismatches are located in the second 
third, and fourth rows and have much lower signals 
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Fig. 30, in graphs 2, 3, and 4, graphically depicts the 
data in Fig. 29. On each graph, the X ordinate is the 
position of the probe in its row on the chip, and the Y 
ordinate is the signal at that probe site after hybridization. 
5 Fig. 31 shows the results of hybridizing mixed target 

populations of WT and mutant p53 genes to the p53 DNA chip. 

Fig. 32, in graphs 1-4, shows (see Fig. 30 as well) the 
hybridization efficiency of a 10-mer probe array as compared 
to a 12-mer probe array. 
10 Fig. 33 shows an image of a p53 DNA chip hybridized to a 

target DNA. 

Fig. 34 illustrates how the actual sequence was read from 
the chip shown in Fig. 33. Gaps in the sequence of letters in 
the WT rows correspond to control probes or sites. Positions 
15 at which bases are miscalled are represented by letters in 

italic type in cells corresponding to probes in which the WT 
bases have been substituted by other bases. 

Fig. 3 5 shows the human mitochondrial genome; "O h " is the 
H strand origin of replication, and arrows indicate the cloned 
2 0 unshaded sequence. 

Fig* 3 6 shows the image observed from application of a 
sample of mitochondrial DNA derived nucleic acid (from the mt4 
sample) on a DNA chip. 

Fig. 37 is similar to Fig. 3 6 but shows the image 

2 5 observed from the mt5 sample. 

Fig. 3 8 shows the predicted difference image between the 
mt4 and mt5 samples on the DNA chip based on mismatches 
between the two samples and the reference sequence. 

Fig. 3 9 shows the actual difference image observed for 

3 0 the mt4 and mt5 samples. 

Fig. 40, in sheets 1 and 2, shows a plot of normalized 
intensities across rows 10 and 11 of the array and a 
tabulation of the mutations detected. 

Fig. 41 shows the discrimination between wild-type and 
3 5 mutant hybrids obtained with the chip. A median of the six 

normalized hybridization scores for each probe was taken; the 
graph plots the ratio of the median score to the normalized 
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hybridization score versus mean counts. A ratio of 1.6 and 
mean counts above 50 yield no false positives. 

Fig. 42 illustrates how the identity of the base mismatch 
may influence the ability to discriminate mutant and wild-type 
sequences more than the position of the mismatch within an 
oligonucleotide probe. The mismatch position is expressed as 
% of probe length from the 3 '-end. The base change is 
indicated on the graph. 

Fig. 43 provides a 5 • to 3 ■ sequence listing of one 
target corresponding to the probes on the chip. x is a 
control probe. Positions that differ in the target (i e are 
mismatched with the probe at the designated site) are in bold 

Fag. 44 shows the fluorescence image produced by scanning 
the chip described in Fig. 17 when hybridized to a sample 

Fag. 45 illustrates the detection of 4 transitions in the 
target sequence relative to the wild-type probes on the chip 
in Fig. 44. K 

Fig - 46 : VLSIPS™ technology applied to the light 
directed synthesis of oligonucleotides. Ligh t (hv ) is shone 
through a mask (MJ to activate functional groups (-OH) on a 
surface by removal of a protecting group (X) . Nucleoside 
building blocks protected with photoremovable protecting 
groups (T-X, C-X) are coupled to the activated areas By 
repeating the irradiation and coupling steps, very complex 
arrays of oligonucleotides can be prepared. 

Fig. 47: Use of the VLSIPS™ process to prepare 
"nucleoside combinatorial" or oligonucleotides synthesized by 
coupling all four nucleosides to form dimers, trimers, and so 

Fig. 48: Deprotection , coupling, and oxidation steps of 
a solid phase DNA synthesis method. 

Fig. 49: An illustrative synthesis route for the 
nucleoside building blocks used in the VLSIPS™ method. 

Fig. 50: A preferred photoremovable protecting group 
MeNPOC, and preparation of the group in active form. 
' Fig. 51: Detection system for scanning a DNA chip. 
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DETAILED DESCRIPTION OF THE INVENTION 
The invention provides a number of strategies for 
comparing a polynucleotide of known sequence (a reference 
sequence) with variants of that sequence (target sequences) - 
5 The comparison can be performed at the level of entire 

genomes, chromosomes, genes, exons or introns, or can focus on 
individual mutant sites and immediately adjacent bases. The 
strategies allow detection of variations, such as mutations or 
polymorphisms, in the target sequence irrespective whether a 
10 particular variant has previously been characterized. The 
strategies both define the nature of a variant and identify, 
its location in a target sequence. 

The strategies employ arrays of oligonucleotide probes 
immobilized to a solid support. Target sequences are analyzed 
15 by determining the extent of hybridization at particular 
probes in the array. The strategy in selection of probes 
facilitates distinction between perfectly matched probes and 
probes showing single-base or other degrees of mismatches. 
The strategy usually entails sampling each nucleotide of 
2 0 interest in a target sequence several times, thereby achieving 
a high degree of confidence in its identity. This level of 
confidence is further increased by sampling of adjacent 
nucleotides in the target sequence to nucleotides of interest. 
The number of probes on the chip can be quite large (e.gr., 
23- UX 5 -"LfL 6 V- Hav.pjyrix.-, U5UXrUJ_y, nn]^ a. <=LmaJJ^ ^jiiyrorJ-Jj^si, nil ♦vhrt: 
total number of probes of a given length are represented. 
Some advantage of the use of only a small proportion of all 
possible probes of a given length include: (i) each position 
in the array is highly informative, whether or not 
30 hybridization occurs; (ii) nonspecific hybridization is 
minimized; (iii) it is straightforward to correlate 
hybridization differences with sequence differences, 
particularly with reference to the hybridization pattern of a 
known standard; and (iv) the ability to address each probe 
35 independently during synthesis, using high resolution 
photolithography, allows the array to be designed and 
optimized for any sequence. For example the length of any 
probe can be varied independently of the others. 
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The present tiling strategies result in sequencing and 
comparison methods suitable for routine large-scale practice 
with a high degree of confidence in the sequence output. 

I. GENERAL TILING STRATEGIES 

A. Selection of Reference Sequence 

The chips are designed to contain probes exhibiting 
complementarity to one or more selected reference sequence 
whose sequence is known. The chips are used to read a target 
sequence comprising either the reference sequence itself or 
variants of that sequence. Target sequences may differ from 
the reference sequence at one or more positions but show a 
high overall degree of sequence identity with the reference 
sequence (e.g., at least 75, 90, 95, 99, 99.9 or 99.99%). Any 
polynucleotide of known sequence can be selected as a 
reference sequence. Reference sequences of interest include 
sequences known to include mutations or polymorphisms 
associated with phenotypic changes having clinical 
significance in human patients. For example, the CFTR gene 
and P53 gene in humans have been identified as the location of 
several mutations resulting in cystic fibrosis or cancer 
respectively. Other reference sequences of interest include 
those that serve to identify pathogenic microorganisms and/ or 
are the site of mutations by which such microorganisms acquire 
drug resistance (e.g., the HIV reverse transcriptase gene). 
Other reference sequences of interest include regions where 
polymorphic variations are known to occur (e.g., the D-loop 
region of mitochondrial DNA) . These reference sequences have 
utility for, e.g., forensic or epidemiological studies. other 
reference sequences of interest include p34 (related to p53), 
p65 (implicated in breast, prostate and liver cancer), and DNA 
segments encoding cytochromes P450 (see Meyer et al., Pharjnac. 
Ther. 46, 349-355 (1990)). Other reference sequences of 
interest include those from the genome of pathogenic viruses 
(e*g., hepatitis (A, B, or C) , herpes virus (e.g., VZV, HSV-l, 
HAV-6, HSV-II, and CMV, Epstein Barr virus), adenovirus, 
influenza virus, f laviviruses , echovirus, rhinovirus, 
coxsackie virus, cornovirus, respiratory syncytial virus, 
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mumps virus, rotavirus, measles virus, rubella virus, 
parvovirus, vaccinia virus, HTLV virus, dengue virus, 
papillomavirus, molluscum virus, poliovirus, rabies virus, JC 
virus and arboviral encephalitis virus. Other reference 
5 sequences of interest are from genomes or episomes of 

pathogenic bacteria, particularly regions that confer drug 
resistance or allow phylogenic characterization of the host 
(e.g., 16S rRNA or corresponding DNA) . For example, such 
bacteria include chlamydia, rickettsial bacteria, 
10 mycobacteria, staphylococci, treptocci, pneumonococci , 

meningococci and conococci, Klebsiella, proteus, serratia, 
pseudomonas, legionella, diphtheria, salmonella, bacilli, 
cholera, tetanus, botulism, anthrax, plague, leptospirosis , 
and Lymes disease bacteria. Other reference sequences of 
15 interest include those in which mutations result in the 

following autosomal recessive disorders: tickle cell anemia, 

^-thalassemia, phenylketonuria, galactosemia, Wilson's 
disease, hemochromatosis, severe combined immunodeficiency, 
alpha-l-antitrypsin deficiency , albinism, alkaptonuria , 
20 lysosomal storage diseases and Ehlers-Danlos syndrome. Other 
reference sequences of interest include those in which 
mutations result in X-linked recessive disorders: hemophilia, 
glucose-6-phosphate dehydrogenase, agammaglobulimenia , 
diabetes insipidus, Lesch-Nyhan syndrome, muscular dystrophy, 
25 Wiskott-Aldrich syndrome, Fabry's disease and fragile X- 
syndrome. Other reference sequences of interest includes 
those in which mutations result in the following autosomal 
dominant disorders: familial hypercholesterolemia, polycystic 
kidney disease, Huntingdon's disease, hereditary 
30 spherocytosis, Marfan 1 s syndrome, von Willebrand's disease, 

neurofibromatosis, tuberous sclerosis, hereditary hemorrhagic 
telangiectasia, familial colonic polyposis, Ehlers-Danlos 
syndrome, myotonic dystrophy, muscular dystrophy, osteogenesis 
imperfecta, acute intermittent porpnyria, anb von-Hippei- 

35 Lindau disease. 

The length of a reference sequence can vary widely from a 
full-length genome, to an individual chromosome, episome, 
gene, component of a gene, such as an exon, intron or 
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regulatory sequences, to a few nucleotides. a reference 
sequence of between about 2, 5, 10, 20, 50, 100, 5000, 1000, 
5,000 or 10,000, 20,000 or 100,000 nucleotides is common. 
Sometimes only particular regions of a sequence (e.g., exons 
of a gene) are of interest. In such situations, the 
particular regions can be considered as separate reference 
sequences or can be considered as components of a single 
reference sequence, as matter of arbitrary choice. 

A reference sequence can be any naturally occurring, 
mutant, consensus or purely hypothetical sequence of 
nucleotides, RNA or DNA. For example, sequences can be 
obtained from computer data bases, publications or can be 
determined or conceived de novo. Usually, a reference 
sequence is selected to show a high degree of sequence 
identity to envisaged target sequences. often, particularly, 
where a significant degree of divergence is anticipated 
between target sequences, more than one reference sequence is 
selected. Combinations of wildtype and mutant reference 
'sequences are employed in several applications of the tiling 
strategy . 

B. Chip Design 

1. Basic Tiling Strat-Agy 

The basic tiling strategy provides an array of 
immobilized probes for analysis of target sequences showing a 
high degree of sequence identity to one or more selected 
reference sequences. The strategy is first illustrated for an 
array that is subdivided into four probe sets, although it 
will be apparent that in some situations, satisfactory results 
are obtained from only two probe sets. A first probe set 
comprises a plurality of probes exhibiting perfect 
complementarity with a selected reference sequence. The 
perfect complementarity usually exists throughout the length 
of the probe. However, probes having a segment or segments of 
perfect complementarity that is/are flanked by leading or 
trailing sequences lacking complementarity to the reference 
sequence can also be used. Within a segment of 
complementarity, each probe in the first probe set has at 
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least one interrogation position that corresponds to a 
nucleotide in the reference sequence. That is, the 
interrogation position is aligned with the corresponding 
nucleotide in the reference sequence, when the probe and 
5 reference sequence are aligned to maximize complementarity 

between the two. If a probe has more than one interrogation 
position, each corresponds with a respective nucleotide in the 
reference sequence. The identity of an interrogation position 
and corresponding nucleotide in a particular probe in the 
10 first probe set cannot be determined simply by inspection of 
the probe in the first set. As will become apparent, an 
interrogation position and corresponding nucleotide is defined 
by the comparative structures of probes in the first probe set 
and corresponding probes from additional probe sets. 
15 In principle, a probe could have an interrogation 

position at each position in the segment complementary to the 
reference sequence. Sometimes, interrogation positions 
provide more accurate data when located away from the ends of 
a segment of complementarity. Thus, typically a probe having 
2 0 a segment of complementarity of length x does not contain more 
than x-2 interrogation positions. Since probes are typically 
9-21 nucleotides, and usually all of a probe is complementary, 
a probe tvj>icallv_ has 1-19 interrogation. qpsJjtixins^ . OfJ-^ui* ^±i*c 
probes contain a single interrogation position, at or near the 
2 5 center of probe. 

For each probe in the first set, there are, for purposes 
of the present illustration, three corresponding probes from 
three additional probe sets. See Fig. 1. Thus, there are 
four probes corresponding to each nucleotide of interest in 
30 the reference sequence. Each of the four corresponding probes 
has an interrogation position aligned with that nucleotide of 
interest. Usually, the probes from the three additional 
probe sets are identical to the corresponding probe from the 
first probe set with one exception. The exception is that at 
35 least one (and often only one) interrogation position, which 
occurs in the same position r in each of the four corresponding 
probes from the four probe sets, is occupied by a different 
nucleotide in the four probe sets. For example, for an A 
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nucleotide in the reference sequence, the corresponding probe 
from the first probe set has its interrogation position 
occupied by a T, and the corresponding probes from the 
additional three probe sets have their respective 
interrogation positions occupied by A, C, or G, a different 
nucleotide in each probe. Of course, if a probe from the 
first probe set comprises trailing or flanking sequences 
lacking complementarity to the reference sequences (see 
Fig. 2), these sequences need not be present in corresponding 
probes from the three additional sets. Likewise corresponding 
probes from the three additional sets can contain leading or 
trailing sequences outside the segment of complementarity that 
are not present in the corresponding probe from the first 
probe set. Occasionally, the probes from the additional three 
probe set are identical (with the exception of interrogation 
position (s)) to a contiguous subsequence of the full 
complementary segment of the corresponding probe from the 
first probe set. In this case, the subsequence includes the 
interrogation position and usually differs from the full- 
length probe only in the omission of one or both terminal 
nucleotides from the termini of a segment of complementarity. 
That is, if a probe from the first probe set has a segment of 
complementarity of length n, corresponding probes from the 
other sets will usually include a subsequence of the segment 
of at least length n-2. Thus, the subsequence is usually at 
least 3, 4,. 7, 9, 15, 21, or 25 nucleotides long, most 
typically, in the range of 9-21 nucleotides. The subsequence 
should be sufficiently long to allow a probe to hybridize 
detectably more strongly to a variant of the reference 
sequence mutated at the interrogation position than to the 
reference sequence. 

The probes can be oligodeoxyribonucleotides or 
oligoribonucleotides, or any modified forms of these polymers 
that are capable of hybridizing with a target nucleic sequence 
by complementary base-pairing. Complementary base pairing 
means sequence-specific base pairing which includes e.g., 
Watson-crick base pairing as well as other forms of base' 
pairing such as Hoogsteen base pairing. Modified forms 
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include 2 1 -O-methy 1 oligor ibonucleot ides and so-called PNAs, 
in which oligodeoxyr ibonucleotides are linked via peptide 
bonds rather than phophodiester bonds. The probes can be 
attached by any linkage to a support (e.g., 3*, 5' or via the 
5 base) . 3 1 attachment is more usual as this orientation is 
compatible with the preferred chemistry for solid phase 
synthesis of oligonucleotides. 

The number of probes in the first probe set (and as a 
consequence the number of probes in additional probe sets) 

10 depends on the length of the reference sequence, the number of 
nucleotides of interest in the reference sequence and the 
number of interrogation positions per probe. In general , each 
nucleotide of interest in the reference sequence requires the 
same interrogation position in the four sets of probes, 

15 Consider, as an example, a reference sequence of 100 

nucleotides, 50 of which are of interest, and probes each 
having a single interrogation position. In this situation, 
the first probe set requires fifty probes, each having one 
interrogation position corresponding to a nucleotide of 

20 interest in the reference sequence. The second, third and 
fourth probe sets each have a corresponding probe for each 
probe in the first probe set, and so each also contains a 
total of fifty probes. The identity of each nucleotide of 
interest in the reference sequence is determined by comparing 

2 5 the relative hybridization signals at four probes having 

interrogation positions corresponding to that nucleotide from 
the four probe sets. 

In some reference sequences, every nucleotide is of 
interest. In other reference sequences, only certain portions 

30 in which variants (e.g., mutations or polymorphisms) are 

concentrated are of interest. In other reference sequences, 
only particular mutations or polymorphisms and immediately 
adjacent nucleotides are of interest. Usually, the first 
probe set has interrogation positions selected to correspond 

35 to at least a nucleotide (e.g., representing a point mutation) 
and one immediately adjacent nucleotide. Usually, the probes 
in the first set have interrogation positions corresponding to 
at least 3, 10, 50, 100, 1000, or 20,000 contiguous 
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nucleotides. The probes usually have interrogation positions 
corresponding to at least 5, 10, 30, 50, 75, 90, 99 or 
sometimes 100% of the nucleotides in a reference sequence. 
Frequently, the probes in the first probe set completely span 
the reference sequence and overlap with one another relative 
to the reference sequence. For example, in one common 
arrangement each probe in the first probe set differs from 
another probe in that set by the omission of a 3 • base 
complementary to the reference sequence and the acquisition of 
a 5' base complementary to the reference sequence. See 
Fig. 3. 

For conceptual simplicity, the probes in a set are 
usually arranged in order of the sequence in a lane across the 
chip. A lane contains a series of overlapping probes, which 
represent or tile across, the selected reference sequence (see 
Fig. 3) . The components of the four sets of probes are 
usually laid down in four parallel lanes, collectively 
constituting a row in the horizontal direction and a series of 
4-member columns in the vertical direction. Corresponding 
probes from the four probe sets (i.e., complementary to the 
same subsequence of the reference sequence) occupy a column. 
Each probe in a lane usually differs from its predecessor in 
the lane by the omission of a base at one end and the 
inclusion of additional base at the other end as shown in 
Fig. 3. However, this orderly progression of probes can be 
interrupted by the inclusion of control probes or omission of 
probes in certain columns of the array. Such columns serve as 
controls to orient the chip, or gauge the background, which 
can include target sequence nonspecif ically bound to the chip. 

The probes sets are usually laid down in lanes such that 
all probes having an interrogation position occupied by an A 
form an-A-lane, all probes having an interrogation position 
occupied by a C form a C-lane, all probes having an 
interrogation position occupied by a G form a G-lane, and all 
probes having an interrogation position occupied by a T (or- U) 
form a T lane (or a U lane) . Note that in this arrangement 
there is not a unique correspondence between probe sets and 
lanes. Thus, the probe from the first probe set is laid down 
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in the A-lane, C-lane, A-lane, A-lane and T-lane for the five 
columns in Fig. 4. The interrogation position on a column of 
probes corresponds to the position in the target sequence 
whose identity is determined from analysis of hybridization to 
5 the probes in that column. Thus, respectively 

correspond to N x -N 5 in Fig. 4. The interrogation position can 
be anywhere in a probe but is usually at or near the central 
position of the probe to maximize differential hybridization 
signals between a perfect match and a single-base mismatch. 

10 For example, for an 11 mer probe, the central position is the 
sixth nucleotide. 

Although the array of probes is usually laid down in rows 
and columns as described above, such a physical arrangement of 
probes on the chip is not essential. Provided that the 

15 spatial location of each probe in an array is known, the data 
from the probes can be collected ana processed to yield the 
sequence of a target irrespective of the physical arrangement 
of the probes on a chip. In processing the data, the 
hybridization signals from the respective probes can be 

20 reasserted into any conceptual array desired for subsequent 

data reduction whatever the physical arrangement of probes on 
the chip. 

A range of lengths of probes can be employed in the 
chips. As noted above, a probe may consist exclusively of a 

25 complementary segments, or may have one or more complementary 
segments juxtaposed by flanking, trailing and/or intervening 
segments. In the latter situation, the total length of 
complementary segment (s) is more important that the length of 
the probe. In functional terms, the complementarity 

30 segment (s) of the first probe sets should be sufficiently long 
to allow the probe to hybridize detectably more strongly to a 
reference sequence compared with a variant of the reference 
including a single base mutation at the nucleotide 
corresponding to the interrogation position of the probe. 

35 Similarly, the complementarity segment(s) in corresponding 

probes from additional probe sets should be sufficiently long 
to allow a probe to hybridize detectably more strongly to a 
variant of the reference sequence having a single nucleotide 
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substitution at the interrogation position relative to the 
reference sequence. A probe usually has a single 
complementary segment having a length of at least 
3 nucleotides, and more usually at least 5, 6 7 8 9 i 0 
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, ' 2 4,' 25 'or 
3 0 bases exhibiting perfect complementarity (other than 
possibly at the interrogation position (s) depending on the 
probe set) to the reference sequence. m bridging strategies 
where more than one segment of complementarity is present 
each segment provides at least three complementary nucleotides 
to the reference sequence and the combined segments provide at 
least two segments of three or a total of six complementary 
nucleotides. As in the other strategies, the combined length 
of complementary segments is typically from 6-3 0 nucleotides 
and preferably from about 9-21 nucleotides. The two segments 
are often approximately the same length. often, the probes 
(or segment of complementarity within probes) have an odd 
number of bases, so that an interrogation position can occur 
in the exact center of the probe. 

in some chips, all probes are the same length. other 
chips employ different groups of probe sets, in which case the 
probes are of the same size within a group, but differ between 
different groups. For example, some chips have one group 
comprising four sets of probes as described above in which all 
the probes are 11 mers, together with a second group 
comprising four sets of probes in which all of the probes are 
13 mers. Of course, additional groups of probes can be added. 
Thus, some chips contain, e.g., four groups of probes having 
sizes of ii mers, 13 mers, 15 mers and 17 mers. other chips 
have different size probes within the same group of four probe 
sets. m these chips, the probes in the first set can vary in 
length independently of each other. Probes in the other sets 
are usually the same length as the probe occupying the same 
column from the first set. However, occasionally different 
lengths of probes can be included at the same column position 
m the four lanes. The different length probes are included 
to equalize hybridization signals from probes irrespective of 
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whether A-T or C-G bonds are formed at the interrogation 
position . 

The length of probe can be important in distinguishing 
between a perfectly matched probe and probes showing a single- 
5 base mismatch with the target sequence. The discrimination is 
usually greater for short probes. Shorter probes are usually 
also less susceptible to formation of secondary structures. 
However, the absolute amount of target sequence bound, and 
hence the signal, is greater for larger probes. The probe 

10 length representing the optimum compromise between these 

competing considerations may vary depending on inter alia the 
GC content of a particular region of the target DNA sequence, 
secondary structure, synthesis efficiency and cross- 
hybridization. In some regions of the target, depending on 

15 hybridization conditions, short probes (e.g., 11 mers) may 

— . pr o vide information th a t i s i nannp . Ksible from l onger probes 

(e,g., 19 mers) and vice versa. Maximum sequence information 
can be read by including several groups of different sized 
probes on the chip as noted above. However, for many regions 

20 of the target sequence, such a strategy provides redundant 

information in that the same sequence is read multiple times 
from the different groups of probes. Equivalent information 
can be obtained from a single group of different sized probes 
in which the sizes are selected to maximize readable sequence 

size of probes at different regions of the target sequence can 
be determined from, e.g., Fig. 12, which compares the 
readability of different sized probes in different regions of 
a target. The strategy of customizing probe length within a, 

30 single group of probe sets minimizes the total number of 

probes required to read a particular target sequence. This 
leaves ample capacity for the chip to include probes to other 
reference sequences . 

The invention provides an optimization block which allows 

35 systematic variation of probe length and interrogation 

position to optimize the selection of probes for analyzing a 
particular nucleotide in a reference sequence. The block 
comprises alternating columns of probes complementary to the 
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wildtype target and probes complementary to a specific 
mutation. The interrogation position is varied between 
columns and probe length is varied down a column. 
Hybridization of the chip to the reference sequence or the 
mutant form of the reference sequence identifies the probe 
length and interrogation position providing the greatest 
differential hybridization signal. 

The probes are designed to be complementary to either 
strand of the reference sequence (e.g., coding or non-coding). 
Some chips contain separate groups of probes, one 
complementary to the coding strand, the other complementary to 
the noncoding strand. Independent analysis of coding and 
noncoding strands provides largely redundant information. 
However, the regions of ambiguity in reading the coding strand 
are not always the same as those in reading the noncoding 
strand. Thus, combination of the information from coding and 
noncoding strands increases the overall accuracy of 
sequencing. 

Some chips contain additional probes or groups of probes 
designed to be complementary to a second reference sequence. 
The second reference sequence is often a subsequence of the 
first reference sequence bearing one or more commonly 
occurring mutations or interstrain variations. The second 
group of probes is designed by the same principles as 
described above except that the probes exhibit complementarity 
to the second reference sequence. The inclusion of a second 
group is particular useful for analyzing short subsequences of 
the primary reference sequence in which multiple mutations are 
expected to occur within a short distance commensurate with 
the length of the probes (i.e., two or more mutations within 9 
to 21 bases). Of course, the same principle can be extended 
to provide chips containing groups of probes for any number of 
reference sequences. Alternatively, the chips may contain 
additional probe (s) that do not form part of a tiled array as 
noted above, but rather serves as probe (s) for a conventional 
reverse dot blot. For example, the presence of mutation can 
be detected from binding of a target sequence to a single 
oligomer ic probe harboring the mutation. Preferably, an 
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additional probe containing the equivalent region of the 
wildtype sequence is included as a control. 

The chips are read by comparing the intensities of 
labelled target bound to the probes in an array. 
5 Specifically, a comparison is performed between each lane of 
probes (e.g., A, C, G and T lanes) at each columnar position 
(physical or conceptual) . For a particular columnar position, 
the lane showing the greatest hybridization signal is called 
as the nucleotide present at the position in the target 

10 sequence corresponding to the interrogation position in the 

probes. See Fig. 5. The corresponding position in the target 
sequence is that aligned with the interrogation position in 
corresponding probes when the probes and target are aligned to 
maximize complementarity. Of the four probes in a column, 

15 only one can exhibit a perfect match to the target sequence 
whereas the others usually exhibit at least a one base pair 
mismatch. The probe exhibiting a perfect match usually 
produces a substantially greater hybridization signal than the 
other three probes in the column and is thereby easily 

2 0 identified. However, in some regions of the target sequence, 
the distinction between a perfect match and a one-base 
mismatch is less clear. Thus, a call ratio is established to 
define the ratio of signal from the best hybridizing probes to 
the second best hybridizing probe that must be exceeded for a 

2 5 particular target position to be read from the probes, A high 

call ratio ensures that few if any errors are made in calling 
target nucleotides, but can result in some nucleotides being 
scored as ambiguous, which could in fact be accurately read. 
A lower call ratio results in fewer ambiguous calls, but can 

3 0 result in more erroneous calls. It has been found that at a 

call ratio of 1.2 virtually all calls are accurate. However, 
a small but significant number of bases (e.g., up to about 
10%) may have to be scored as ambiguous. 

Although small regions of the target sequence can 
3 5 sometimes be ambiguous, these regions usually occur at the 

same or similar segments in different target sequences. Thus, 
for precharacterized mutations., it is known in advance whether 
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that mutation is likely to occur within a region of 
unambiguously determinable sequence. 

An array of probes is most useful for analyzing the 
reference sequence from which the probes were designed and 
variants of that sequence exhibiting substantial sequence 
similarity with the reference sequence (e.g., several single- 
base mutants spaced over the reference sequence) . When an 
array is used to analyze the exact reference sequence from 
which it was designed, one probe exhibits a perfect match to 
the reference sequence, and the other three probes in the same 
column exhibits single-base mismatches. Thus f discrimination 
between hybridization signals is usually high and accurate 
sequence is obtained. High accuracy is also obtained when an 
array is used for analyzing a target sequence comprising a 
variant ol 'tne reference sequence 'tnait'nas a single mutation 
relative to the reference sequence,, or several widely sqaced 
mutations relative to the reference sequence. At different 
- mutant loci, one probe exhibits a perfect match to the target, 
and the other three probes occupying the same column exhibit 
single-base mismatches, the difference (with respect to 
analysis of the reference sequence) being the lane in which 
= the perfect match occurs . 

For target sequences showing a high degree of divergence 
from the reference strain or incorporating several closely 
spaced mutations from the reference strain, a single group of 
probes (i.e., designed with respect to a single reference 
sequence) will not always provide accurate sequence for the 
highly variant region of this sequence. At some particular 
columnar positions, it may be that no single probe exhibits 
perfect complementarity to the target and that any comparison 
must be based on different degrees of mismatch between the 
four probes. Such a comparison does not always allow the 
target nucleotide corresponding to that columnar position to 
be called. Deletions in target sequences can be detected by 
loss of signal from probes having interrogation positions 
encompassed by the deletion. However, signal may also be lost 
from probes having interrogation positions closely proximal to 
the deletion resulting in some regions of the target sequence 



r 
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that cannot be read. Target sequence bearing insertions will 
also exhibit short regions including and proximal to the 
insertion that usually cannot be read. 

The presence of short regions of difficult-to-read target 
5 because of closely spaced mutations, insertions or deletion, 
does not prevent determination of the remaining sequence of 
the target as different regions of a target sequence are 
determined independently. Moreover, such ambiguities as might 
result from analysis of diverse variants with a single group 

10 of probes can be avoided by including multiple groups of probe 
sets on a chip. For example, one group of probes can be 
designed based on a full-length reference sequence, and the 
other groups on subsequences of the reference sequence 
incorporating frequently occurring mutations or strain 

15 variations. 

A particular advantage of the present sequencing strategy 
over conventional sequencing methods is the capacity 
simultaneously to detect and quantify proportions of multiple 
target sequences. Such capacity is valuable, e.gr., for 

20 diagnosis of patients who are heterozygous with respect to a 
gene or who are infected with a virus, such as HIV, which is 
usually present in several polymorphic forms. Such capacity 
is also useful in analyzing targets from biopsies of tumor 
cells and surrounding tissues. The presence of multiple 

2 5 target sequences is detected from the relative signals of the 

four probes at the array columns corresponding to the target 
nucleotides at which diversity occurs. The relative signals 
at the four probes for the mixture under test are compared 
with the corresponding signals from a homogeneous reference 

3 0 sequence. An increase in a signal from a probe that is 

m"ismatche~d with respect to the reference sequence, and a 
corresponding decrease in the signal from the probe which is 
matched with the reference sequence signal the presence of a 
mutant strain in the mixture. The extent in shift in 
35 hybridization signals of the probes is related to the 

proportion of a target sequence in the mixture. Shifts in 
relative, hybridization signals can be quantitatively related 
to proportions of reference and mutant sequence by prior 
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calibration of the chip with seeded fixtures of the mutant and 
reference sequences. By this means, a chip can be used to 
detect variant or mutant strains constituting as little as i 
5, 20, or 25 % of a mixture of stains. 

Similar principles allow the simultaneous analysis of 
multiple target sequences even when none is identical to the 
reference sequence. For example, with a mixture of two target 
sequences bearing first and second mutations, there would be a 
variation in the hybridization patterns of probes having 
interrogation positions corresponding to the f irst and second 
stations relative to the hybridization pattern with the 
reference sequence. At each position, one of the probes 
having a mismatched interrogation position relative to the 
reference sequence would show an increase in hybridization 
signal, and the probe having a matched interrogation position 
relative to the reference sequence would show a decrease in 
hybridization signal. Analysis of the hybridization pattern 
of the mixture of mutant target sequences, preferably in 
comparison with the hybridization pattern of the reference 
sequence, indicates the presence of two mutant target 
sequences, the position and nature of the mutation in each 
strain, and the relative proportions of each strain 

In a variation of the above method, the different 

~! ? 3 miXtUre ° f taC9et are differentially 

labelled before being applied to the array. For example a 
variety of fluorescent labels emitting at different wavelength 
are available. The use of differential labels allows 
independent analysis of different targets bound simultaneously 
to the array. For example, the methods permit comparison of 
target sequences obtained from a patient at different stages 
of a disease. 



2. Omis sion of Probftg 
The general strategy outlined above employs four probes 
to read each nucleotide of interest in a target sequence. One 
Probe (from the first probe set) shows a perfect match to the 
reference sequence and the other three probes (from the 
second, third and fourth probe sets) exhibit a mismatch with 
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the reference sequence and a perfect match with a target 
sequence bearing a mutation at the nucleotide of interest. 
The provision of three probes from the second, third and 
fourth probe sets allows detection of each of the three 
5 possible nucleotide substitutions of any nucleotide of 

interest- However, in some reference sequences or regions of 
reference sequences, it is known in advance that only certain 
mutations are likely to occur. Thus, for example, at one site 
it might be known that an A nucleotide in the reference 

10 sequence may exist as a T mutant in some target sequences but 
is unlikely to exist as a C or G mutant. Accordingly, for 
analysis of this region of the reference sequence, one might 
include only the first and second probe sets, the first probe 
set exhibiting perfect complementarity to the reference 

15 sequence, and the second probe set having an interrogation 

position occupied by an invariant A residue (for detecting Lh« 
T mutant) . In other situations, one might include the first, 
second and third probes sets (but not the fourth) for 
detection of a wildtype nucleotide in the reference sequence 

2 0 and two mutant variants thereof in target sequences. In some 

chips, probes that would detect silent mutations (i.e., not 
affecting amino acid sequence) are omitted. 

In some chips, the probes from the first probe set are 
omitted corresponding to some or all positions of the 
25 reference sequences. Such chips comprise at least two probe 
sets. The first probe set has a plurality of probes. Each 
probe comprises a segment exactly complementary to a 
subsequence of a reference sequence except in at least one 
interrogation position. A second probe set has a 

3 0 corresponding probe for each probe in the first probe set. 

The corresponding probe in the second probe set is identical 
to a sequence comprising the corresponding probe form the 
first probe set or a subsequence thereof that includes the at 
least one (and usually only one) interrogation position except 
35 that the at least one interrogation position is occupied by a 
different nucleotide in each of. the two corresponding probes 
from the first, and second probe sets-. A third probe set, if 
present, also comprises a corresponding probe for each probe 



WO 95/1 1 995 PCT/US94/1 2305 

35 

in the first probe set except at the at least one 
interrogation position, which differs in the corresponding 
probes from the three sets. Omission of probes having a 
segment exhibiting perfect complementarity to the reference 
sequence results in loss of control information, i.e., the 
detection of nucleotides in a target sequence that are the 
same as those in a reference sequence. However, similar 
information can be obtained by hybridizing a chip lacking 
probes from the first probe set to both target and reference 
sequences. The hybridization can be performed sequentially, 
or concurrently, if the target and reference are 
differentially labelled. In this situation, the presence of 
mutation is detected by a shift in the background 
hybridization intensity of the reference sequence to a 
perfectly matched hybridization signal of the target sequence 
rather than by a comparison of the hybridization intensities 
of probes from the first set with corresponding probes from 
the second, third and fourth sets. 

3. Wildtype Probe Lane 

When the chips comprise four probe sets, as discussed 
supra, and the probe sets are laid down in four lanes, an A 
lane, a C-lane, a G lane and a T or U lane, the probe'having £ 
segment exhibiting perfect complementarity to a reference 
sequence varies between the four lanes from one column to 
another. This does not present any significant difficulty in 
computer analysis of the data from the chip. However, visual 
inspection of the hybridization pattern of the chip is 
sometimes facilitated by provision of an extra lane of probes, 
in which each probe has a segment exhibiting perfect 
complementarity to the reference sequence. See Fig. 4. This 
segment- is identical to a segment from one of the probes in 
the other four lanes (which lane depending on the column 
position) . The extra lane of probes (designated the wildtype 
lane) hybridizes to a target sequence at all nucleotide 
positions except those in which deviations from the reference 
sequence occurs. The hybridization pattern of the wildtype 
lane thereby provides a simple visual indication of mutations. 



PCT/US94/ 12305 

WO 95/11995 

36 

4. Deletion, Insertion and Multiple-Mutat ion Probes 
Some chips provide an additional probe set specifically 
designed for analyzing deletion mutations. The additional 
probe set comprises a probe corresponding to each probe in the 
5 first probe set as described above. However, a probe from the 
additional probe set differs from the corresponding probe in 
the first probe set in that the nucleotide occupying the 
interrogation position is deleted in the probe from the 
additional probe set. See Fig. 6. Optionally, the probe from 
10 the additional probe set bears an additional nucleotide at one 
of its termini relative to the corresponding probe from the 
first probe set. The probe from the additional probe set will 
hybridize more strongly than the corresponding probe from the 
first probe set to a target sequence having a single base 
15 deletion at the nucleotide corresponding to the interrogation 
position. Additional probe sets are provided in wnich noL 
only the interrogation position, but also an adjacent 
nuc leot ide is detected . 

Similarly, other chips provide additional probe sets for 
20 analyzing insertions. For example, one additional probe set 

has a probe corresponding to each probe in the first probe set 
as described above. However, the probe in the additional 
probe set has an extra T nucleotide inserted adjacent to trie 
interrogation position. See Fig. 6. Optionally, the probe 
25 has one fewer nucleotide at one of its termini relative to the 
corresponding probe from the first probe set. The probe from 
the additional probe set hybridizes more strongly than the 
corresponding probe from the first probe set to a target 
sequence 'navxng an "a nucleotide inserted in a position 
30 adjacent to that corresqondinq^ to the interruption. -qofd±J-nn-. 
Similar additional probe sets are constructed having C, G or 
T/U nucleotides inserted adjacent to the interrogation 
position. Usually, four such probe sets, one for each 
nucleotide , are used in combination. 
35 Other chips provide additional probes (multiple-mutation 

probes) for analyzing target sequences having* multiple closely 
spaced mutations. A multiple-mutation probe is usually 
identical to a corresponding probe from the first set as 
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described above, except in the base occupying the 
interrogation position, and except at one or more additional 
positions, corresponding to nucleotides in which substitution 
may occur in the reference sequence. The one or more 
additional positions in the multiple mutation probe are 
occupied by nucleotides complementary to the nucleotides 
occupying corresponding positions in the reference sequence 
when the possible substitutions have occurred. 
5. Block Tiling 
As noted in the discussion of the general tiling 
strategy, a probe in the first probe set sometimes has more 
than one interrogation position. m this situation, a probe 
in the first probe set is sometimes matched with multiple 
groups of at least one, and usually, three additional probe 
sets. See Fig. 7. Three additional probe sets are used to 
allow detection of the three possible nucleotide substitutions 
at any one position. If only certain types of substitution 
are likely to occur (e.g., transitions), only one or two 
additional probe sets are required (analogous to the use of 
-prxJut^ "-XT! ^tre ^rj?rxng -sxxchregy, . - X o Tjrxusrrate -for th.e 

situation where a group comprises three additional probe sets, 
a first such group comprises second, third and fourth probe 
sets, each of which has a probe corresponding to each probe in 
the first probe set. The corresponding probes from the 
second, third and fourth probes sets differ from the 
corresponding probe in the first set at a first of the 
interrogation positions. Thus, the relative hybridization 
signals from corresponding probes from the first, second, 
third and fourth probe sets indicate the identity of the 
nucleotide in a target sequence corresponding to the first 
interrogation position. A second group of three probe sets 
(designated fifth, sixth and seventh probe sets), each also 
have a probe corresponding to each probe in the first probe 
set. These corresponding probes differ from that in the first 
probe set at a second interrogation position. The relative 
hybridization signals from corresponding probes from the 
first, fifth, sixth, and seventh probe sets indicate the 
identity of the nucleotide in the target sequence 
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corresponding to the second interrogation position. As noted 
above, the probes in the first probe set often have seven or 
more interrogation positions. If there are seven 
interrogation positions, there are seven groups of three 
5 additional probe sets, each group of three probe sets serving 
to identify the nucleotide corresponding to one of the seven 
interrogation positions . 

Each block of probes allows short regions of a target 
sequence to be read. For example, for a block of probes 

10 having seven interrogation positions, seven nucleotides in the 
target sequence can be read. Of course, a chip can contain 
any number of blocks depending on how many nucleotides of the 
target are of interest. The hybridization signals for each 
block can be analyzed independently of any other block. The 

15 block tiling strategy can also be combined with other tiling 

strategies, with different parts of the same reference 

sequence being tiled by different strategies. 

The block tiling strategy offers two advantages over the 
basic strategy in which each probe in the first set has a 

20 single interrogation position. One advantage is that the same 
sequence information can be obtained from fewer probes. A 
second advantage is that each of the probes constituting a 
block (i.e., a probe from the first probe set and a 
corresponding probe from each of the other probe sets) can 

25 have identical 3' and 5 1 sequences, with the variation 

confined to a central segment containing the interrogation 
positions* The identity of 3' sequence between different 
probes simplifies the strategy for solid phase synthesis of 
the probes on the chip and results in more uniform deposition 

30 of the different probes on the chip, thereby in turn 

increasing the uniformity of signal to noise ratio for 
different regions of the chip. A third advantage is that 
greater signal uniformity is achieved within a block. 

3 5 6. Multiplex Tiling 

In the block tiling strategy discussed above, the - . 
identity of a nucleotide in a target or reference sequence is 
determined by comparison of hybridization patterns of one 
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probe having a segment showing a perfect match with that of 
other probes (usually three other probes) showing a single 
base mismatch, in multiplex tiling, the identity of at least 
two nucleotides in a reference or target sequence is 
determined by comparison of hybridization signal intensities 
of four probes, two of which have a segment showing perfect 
complementarity or a single base mismatch to the reference 
sequence, and two of which have a segment showing perfect 
complementarity or a double-base mismatch to a segment. The 
four probes whose hybridization patterns are to be compared 
each have a segment that is exactly complementary to a 
reference sequence except at two interrogation positions, in 
which the segment may or may not be complementary to the 
reference sequence. The interrogation positions correspond to 
the nucleotides in a reference or target sequence which are 
determined by the comparison of intensities. The nucleotides 
occupying the interrogation positions in the four probes are 
selected according to the following rule. The first 
interrogation position is occupied by a different nucleotide 
» each of the four probes. The second interrogation position 
» also occupied by a different nucleotide in each of the four 
probes. m two of the four probes, designated the first and 
second probes, the segment is exactly complementary to the 
reference sequence except at not more than one of the two 
interrogation positions. m other words, one of the 
interrogation positions is occupied by a nucleotide that is 
complementary to the corresponding nucleotide from the 
reference sequence and the other interrogation position may or 
»ay not be so occupied. m the other two of the four probes 

co e mT at t the third and fourth probes ' the se ** ent is ~c t iy 

complementary to the reference sequence except that both 
interrogation positions are occupied by nucleotides which are 
noncomplementary to the respective corresponding nucleotides 
in the reference sequence. 

There are number of ways of satisfying these condition,, 
depending on whether the two nucleotides in the reference 
sequence corresponding to the two interrogation positions are 
the same or different. if these two nucleotides are different 
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in the reference sequence (probability^ 3/4),,. the conditions 
are satisfied by each of the two interrogation positions being 
occupied by the same nucleotide in any given probe. For 
example, in the first probe, the two interrogation positions 
5 would both be A, in the second probe, both would be C, in the 
third probe, each would be G, and in the fourth probe each 
would be T or U. If the two nucleotides in the reference 
sequence corresponding to the two interrogation positions are 
different, the conditions noted above are satisfied by each of 

10 the interrogation positions in any one of the four probes 

being occupied by complementary nucleotides. For example, in 
the first probe, the interrogation positions could be occupied 
by A and T, in the second probe by C and G, in the third probe 
by G and C, and in the four probe, by T and A. See (Fig. 8) . 

15 When the four probes are hybridized to a target that is 

the same as the reference sequence or differs from the 

reference sequence at one (but not both) of the interrogation 
positions, two of the four probes show a double-mismatch with 
the target and two probes show a single mismatch. The 

2 0 identity of probes showing these different degrees of mismatch 
can be determined from the different hybridization signals. 
From the identity of the probes showing the different degrees 
of mismatch, the nucleotides occupying both of the 
interrogation positions in the target sequence can be deduced. 

25 For ease of illustration, the multiplex strategy has been 

initially described for the situation where there are two 
nucleotides of interest in a reference sequence and only four 
probes in an array. Of course, the strategy can be extended 
to analyze any number of nucleotides in a target sequence toy 

30 using additional probes. In one variation, each pair of 

interrogation positions is read from a unique group of four 
probes. In a block variation, different groups of four probes 
exhibit the same segment of complementarity with the reference 
sequence, but the interrogation positions move within a block. 

35 The block and standard multiplex tiling variants can of course 
be used in combination for different regions of a reference 
sequence. Either or both variants can also be used in 
combination with any of the other tiling strategies described. 



WO 95/11995 PCT/US94/12305 

41 

Z-: Helper Mutatione; 

Occasionally small regions of a referencp ™, 
t . , . , . -reference sequence aive a 

low hybridation signal as a result of anneaiing Qf ^ 

The self-annealing reduces the amount of probe effectively 
available for hybridizing to the target. Although such 
regions of the target are generally sn>aH and the reduction of 
hybridation signal is usually not so substantial as to 
obscure the sequence of this region, this concern can be 
avoided by the use of probes incorporating helper stations. 

coir I mUtatl ° n(S) "™ to *™*-«P "gions of internal 
complementarity within a probe and thereby prevent annealing. 
Usually, one or two helper stations are quite sufficient for 
this purpose. The inclusion of helper mutations can be 
beneficial in any of the tiling strategies noted above m 
general each probe having a particular interrogation position 
has the same helper mutation (s) . Thus , such probes ^ 
segment in common which shows perfect complementarity with a 
reference sequence, except that the segment contains at least 
one helper mutation (the same in each of the probes) and at 
least one interrogation position (different in all of the 
probes). For example, in the basic tiling strategy, a probe 
from the first probe set comprises a segment containing^ 
interrogation position and showing perfect complementarity 
with a reference sequence except for one or two helper 
stations. The corresponding probes from the second, third 
and fourth probe sets usually comprise the same segment (or 
sometimes a subsequence thereof including the helper 
mutation (s) and interrogation position), except that the base 
occupying the interrogation position varies in each probe. 

Usually, the helper mutation tiling strategy is used in 
=on 3unc tion with one of the tiling strategies described above 
The probes containing helper mutations are used to tile 
regions of a reference sequence otherwise giving low 

and br th i2a ^° n Si9nal (e ' 7 -' b6CaUSe ° f S6lf - CO *P1— tarity) . 
and the alternative tiling strategy is used to tile 

intervening regions. 
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8 . pooling strategies 

Pooling strategies also employ arrays of immobilized 
probes. Probes are immobilized in cells of an array, and the 
hybridization signal of each cell can be determined 
5 independently of any other cell. A particular cell may be 

occupied by pooled mixture of probes. Although the identity 
of each probe in the mixture is known, the individual probes 
in the pool are not separately addressable. Thus, the 
hybridization signal from a cell is the aggregate of that of 
10 the different probes occupying the cell. In general, a cell 
is scored as hybridizing to a target sequence if at least one 
probe occupying the cell comprises a segment exhibiting 
perfect complementarity to the target sequence. 

A simple strategy to show the increased power of pooled 
15 strategies over a standard tiling is to create three cells 
each containing a pooled probe having a single pooled 
position, the pooled position being the same in each of the 
pooled probes. At the pooled position, there are two possible 
nucleotide, allowing the pooled probe to hybridize to two 
20 target sequences. In tiling terminology, the pooled position 
of each probe is an interrogation position. As will become 
apparent, comparison of the hybridization intensities of the 
pooled probes from the three cells reveals the identity of the 
nucleotide in the target sequence corresponding to the 
25 interrogation position (i-e., that is matched with the 

interrogation position when the target sequence and pooled 
probes are maximally aligned for complementarity) . 

The three cells are assigned probe pools that are 
perfectly complementary to the target except at the pooled 
30 position, which is occupied by a different pooled nucleotide 
in each probe as follows: 
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[AC] = M, [GT] =K, [AG]=R 

as substitutions in the probe 

IUPAC standard ambiguity notation) 

X - interrogation position 
Target : TAACCACTCACGGGAGCA 

Pool 1: ATTGGMGAGTGCCC 

=ATTGGaGAGTGCCC (complement to mutant 'tM 

+ATTGGCGAGTGCCC (complement to mutant 'g') 

Pool 2: ATTGGKGAGTGCCC 

=ATTGGgGAGTGCCC (complement to mutant 'cM 

+ATTGGtGAGTGCCC (complement to wild type -a') 

Pool 3 : ATTGGRGAGTGCCC 

=ATTGGaGAGTGCCC . (complement to mutant 't'l 
+ATTGGgGAGTGCCC (complement to mutant 'c') 

With 3 pooled probes, all 4 possible single base pair states 
(wild and 3 mutants) are detected. A pool hybridizes with a 
target if some probe contained within that pool is 
complementary to that target. 

p ool: Hybridization? 

Target: TAACCACTCACGGGAGCA n 5 I 

Mutant: TAACCcCTCACGGGAGCA n v 

Mutant: TAACCgCTCACGGGAGCA y n y 

Mutant: TAACCtCTCACGGGAGCA y n y 

A cell containing a pair (or more) of oligonucleotides 
lights up when a target complementary to any of the 
oligonucleotide in the cell is present. Using the simple 
strategy, each of the four possible targets (wild and three 
mutants) yields a unique hybridization pattern among the three 
cells. 

Since a different pattern of hybridizing pools is 
obtained for each possible nucleotide in the target sequence 
corresponding to the pooled interrogation position in the 
probes, the identity of the nucleotide can be determined from 
the hybridization pattern of the pools. Whereas, a standard 
tiling requires four cells to detect and identify the possible 
single-base substitutions at one location, this simple pooled 
strategy only requires three cells. 
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A more efficient pooling strategy for sequence analysis 
is the 'Trellis 1 strategy. In this strategy, each pooled 
probe has a segment of perfect complementarity to a reference 
sequence except at three pooled positions. One pooled 
5 position is an N pool. The three pooled positions may or may 
not be contiguous in a probe. The other two pooled positions 
are selected from the group of three pools consisting of (1) M 
or K, (2) R or Y and (3) W or S, where the single letters are 
IUPAC standard ambiguity codes* The sequence of a pooled 

10 probe is thus, of the form XXXN[ (M/K) or (R/Y) or (W/S) ] [ (M/K) 
or (R/Y) or (W/S) ]XXXXX, where XXX represents bases 
complementary to the reference sequence. The three pooled 
positions may be in any order, and may be contiguous or 
separated by intervening nucleotides. For, the two positions 

15 occupied by [(M/K) or (R/Y) or (W/S)], two choices must be 

made, First, one must select one of LIib following three pairs 

of pooled nucleotides (1) M/K, (2) R/Y and (3) W/S. The one 
of three pooled nucleotides selected may be the same or 
different at the two pooled positions. Second, supposing, for 

20 example, one selects M/K at one position, one must then chose 
between M or K. This choice should result in selection of a 
pooled nucleotide comprising a nucleotide that complements the 
corresponding nucleotide in a reference sequence, when the 
probe and reference sequence are maximally aligned. The same 

2 5 principle governs the selection between R and Y, and between W 

and S. A trellis pool probe has one pooled position with four 
possibilities, and two pooled positions, each with two 
possibilities. Thus, a trellis pool probe comprises a mixture 
of 16 (4 x 2 x 2) probes. Since each pooled position includes 

3 0 one nucleotide that complements the corresponding nucleotide 

from the reference sequence, one of these 16 probes has a 
segment that is the exact complement of the reference 
sequence. A target sequence that is the same as the reference 
sequence (i.e., a wildtype target) gives a hybridization 
3 5 signal to each probe cell. Here, as in other tiling methods, 
the segment of complementarity should be sufficiently long to 
permit specific hybridization of a pooled probe to a reference 
sequence be detected relative to a variant of that reference 
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sequence. Typically, the segment of complementarity is about 
9-21 nucleotides . 

A target sequence is analyzed by comparing hybridization 
intensities at three pooled probes, each having the structure 
described above. The segments complementary to the reference 
sequence present in the three pooled probes show some overlap. 
Sometimes the segments are identical (other than at the 
interrogation positions). However, this need not be the case. 
For example, the segments can tile across a reference sequence 
in increments of one nucleotide (i.e., one pooled probe 
differs from the next by the acquisition of one nucleotide at 
the 5 • end and loss of a nucleotide at the 3 ' end) . The three 
interrogation positions may or may not occur at the same 
relative positions within each pooled probe (i.e., spacing 
from a probe terminus) . All that is required is that one of 
the three interrogation positions from each of the three 
pooled probes aligns with the same nucleotide in the reference 
sequence, and that this interrogation position is occupied by 
a different pooled nucleotide in each of the three probes. in 
one of the three probes, the interrogation position is 
'occupied by an N. m the other two pooled probes the 
-interrogation position is occupied by one of (M/K) or (R/Y) or 
(W/S) . 

In the simplest form of the trellis strategy, three 
pooled probes are used to analyze a single nucleotide in the 
reference sequence. Much greater economy of probes is 
achieved when more pooled probes are included in an array. 
For example, consider an array of five pooled probes each' 
having the general structure outlined above. Three of these 
pooled probes have an interrogation position that aligns with 
the same nucleotide in the reference sequence and are used to 
read that nucleotide. A different combination of three probes 
have an interrogation position that aligns with a different 
nucleotide in the reference sequence. Comparison of these 
three probe intensities allows analysis of this second 
nucleotide. Still another combination of three pooled probes 
from the set of five have an interrogation position that 
aligns with a third nucleotide in the reference sequence and 
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these probes are used to analyze that nucleotide. Thus, three 
nucleotides in the reference sequence are fully analyzed from 
only five pooled probes. By comparison, the basic tiling 
strategy would require 12 probes for a similar analysis. 

As an example, a pooled probe for analysis of a target 
sequence by the trellis strategy is shown below: 

Target : ATTAACCACTCACGGGAGCTCT 
pool : TGGTGNKYGCCCT 

The pooled probe actually comprises 16 individual probes: 



TGGTGAGcGCCCT 
+TGGTGcGcGCCCT 
15 +TGGTGgGcGCCCT 
+TGGTGtGcGCCCT 
+TGGTGAtcGCCCT 

-fTGGTGCtCGCCCT 

+TGGTGgtcGCCCT 
+TGGTGttcGCCCT 
+TGGTGAGTGCCCT 
+TGGTGCGTGCCCT 
-t-TGGTGgGTGCCCT 
+TGGTGtGTGCCCT 
-KTGGTGAtTGCCCT 
+TGGTGctTGCCCT 
+TGGTGgtTGCCCT 
+TGGTGttTGCCCT 



30 



20 



25 



The trellis strategy employs an array of probes having at 
least three cells, each of which is occupied by a pooled probe 
as described above. 

Consider the use of three such pooled probes for 

3 5 analyzing a target sequence, of which one position may contain 

any single base substitution to the reference sequence (i.e, 
there are four possible target sequences to be distinguished) . 
Three cells are occupied by pooled probes having a pooled 
interrogation position corresponding to the position of 
possible substitution in the target sequence, one cell with an 
•N', one cell with one of ' M 1 or 'K', and one cell with 'R' or 
ty« ( An interrogation position corresponds to a nucleotide in 
the target sequence if it aligns adjacent with that nucleotide 
when the probe and target . sequence are aligned to maximize 

4 5 complementarity. Note that although each of the pooled 



40 



WO 95/11995 PCT/US94/12305 

47 

probes has two other pooled positions, these positions are not 
relevant for the present illustration. The positions are only 
relevant when more than one position in the target sequence is 
to be read, a circumstance that will be considered later. For 
present purposes, the cell with the >n< in the interrogation 
position lights up for the wildtype sequence and any of the 
three single base substitutions of the target sequence. The 
cell with M/K in the interrogation position lights up for the 
wildtype sequence and one of the single-base substitutions 
The cell with R/Y in the interrogation position lights up for 
the wildtype sequence and a second of the single-base 
substitutions. Thus, the four possible target sequences 
hybridi 2e to the three pools of probes in four distinct 
patterns, and the four possible target sequences can be 
distinguished . 

To illustrate further, consider four possible target 
sequences (differing at a single position) and a pooled probe 
having three pooled positions, N, K and Y with the Y position 
as the interrogation position (i.e., aligned with the variable 
position in the target sequence) : 
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Target 

Wild • ATTAACCACTCACGGGAGCTCT (w) 

Mutants : ATTAACCACTCcCGGGAGCTCT ( c ) 
Mutants : ATTAACCACTCgCGGGAGCTCT ( g ) 
Mutants : ATTAACCACTCtCGGGAGCTCT ( t ) 

TGGTGNKYGCCCT (pooled probe) . 

The sixteen individual component probes of the pooled probe 

hybridize to the four possible target sequences as follows: 

TARGET 

w C g t 

TGGTGAGcGCCCT n n y n 

TGGTGcGcGCCCT n n n n 

TGGTGgGcGCCCT n n n n 

TGGTGtGcGCCCT n n n n 

TGGTGAtcGCCCT n n n n 

TGGTGctcGCCCT n n n n 

TGGTGgtcGCCCT n n n n 

TGGTGttcGCCCT n n n n 

TGGTGAGTGCCCT y n n n 

TGGTGcGTGCCCT n n n n 

TCGTGgGTGCCCT D n D n 

TGGTGtGTGCCCT n n n n 

TGGTGAtTGCCCT n n n n 

25 TGGTGctTGCCCT n n n n 

TGGTGgtTGCCCT n n n n 

TGGTGttTGCCCT n n n n 

The pooled probe hybridizes according to the aggregate of its 
3 0 components : 

Poo 1 : TGGTGNKYGCCCT y n y n 

Thus, as stated above, it can be seen that a pooled probe 

wildtype target and one of the mutants. Similar tables can be 
drawn to illustrate the hybridization patterns of probe pools 
having other pooled nucleotides at the interrogation position. 
The above strateqx. of usinq, qopled probes to analy/ft. a_ 
40 single base in a target sequence can readily be extended to 
analyze any number of bases. At this point, the purpose of 
including three pooled positions within each probe will become 
apparent. In the example that follows, ten pools of probes, 
each containing three pooled probe positions, can be used to 
4 5 analyze a each of a contiguous sequence of eight nucleotides 
in a target sequence. 
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ATTAACCACTCACGGGAGCTCT Reference sequence 
Readable nucleotides 



Pools: 

4 TAATTNKYGAGTG 

5 AATTGNKRAGTG C 

6 ATTGGNKRGTGCC 
^ ^"GtrTNHRTGtTCC 

8 TGGTGNKYGCCCT 

9 GGTGANKRCCCTC 
}° GTGAGNKYCCTCG 

TG AGTNMY CT CG A 
GAGTGNMYTCGAG 
AGTGCNMYCGAGA 



11 
12 
13 



In this example, the different pooled probes tile across 
the reference sequence, each pooled probe differing from the 
next by increments of one nucleotide. For each of the 
readable nucleotides in the reference sequence, there are 
three probe pools having a pooled interrogation position 
aligned with the readable nucleotide. For example, the 12th 
nucleotide from the left in the reference sequence is aligned 
wxth pooled interrogation positions in pooled probes 8 , 9 , and 
10. Comparison of the hybridization intensities of these 
pooled probes reveals the identity of the nucleotide occupying 
position 12 in a target sequence. 



Targets 

Wild : ATTAACCACTCACGGGAGCTCT 
Mutants : ATTAACCACTCcCGGGAGCTCT 
Mutants : ATTAACCACTCgCGGGAGCTCT 
Mutants : ATTAACCACTCtCGGGAGCTCT 

Example Intensities: 





Pools 




8 


9 


10 


Y 


Y 


Y 


N 


Y 


Y 


Y 


N 


Y 


N 


N 


Y 





= lit cell 


Wild 










r 


== blank cell 










H 






'G» 




























None 











Thus, for example, if pools 8, 9 and 10 all light up one 
knows the target sequence is wildtype, if poo is, 9 and 10 
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light up, the target sequence has a C mutant at position 12. 
If pools 3 and 10 light up, the target sequence has a G mutant 
at position 12. If only pool 10 lights up, the target 
sequence has a t mutant at position 12. 
5 The identity of other nucleotides in the target sequence 

is determined by a comparison of other sets of three pooled 
probes. For example, the identity of the 13th nucleotide in 
the target sequence is determined by comparing the 
hybridization patterns of the probe pools designated 9, 10 and 
10 11. Similarly, the identity of the 14th nucleotide in the 

target sequence is determined by comparing the hybridization 
patterns of the probe pools designated 10, 11, and 12. 

In the above example, successive probes tile across the 
reference sequence in increments of one nucleotide, and each 
15 probe has three interrogation positions occupying the same 

positions in each probe relative to Lilts terminus of the probe — 

(i.e., the 7, 8 and 9th positions relative to the 3 ■ 
terminus). However, the trellis strategy does not require 
that probes tile in increments of one or that the 
20 interrogation position positions occur in the same position in 
each probe. In a variant of trellis tiling referred to as 
"loop" tiling, a nucleotide of interest in a target sequence 
is read by comparison of pooled probes, which each have a 
pooled interrogation position corresponding to the nucleotide 
25 of interest, but in which the spacing of the interrogation 
position in the probe differs from probe to probe. 
Analogously to the block tiling approach, this allows several 
nucleotides to be read from a target sequence from a 
collection of probes that are identical except at the 
3 0 interrogation position. The identity in sequence of probes, 
particularly at their 3' termini, simplifies synthesis of the 
array and result in more uniform probe density per cell. 

To illustrate the loop strategy, consider a reference 
sequence of which the 4 , 5, 6, 7 and 8th nucleotides (from the 
35 3' termini are to be read. All of the four possible. 

nucleotides at each of these positions can be read from 
comparison of hybridization intensities of five pooled probes. 
Note that the pooled positions in the probes are different 
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(for example in probe 55, the pooled positions are 4 5 and 6 
and in probe 56, 5, 6 and 7). 6 

™S?? GAGCA ^ e f ere nce sequence 
5 6 ATTGNKRAGTGCC 

57 ATTGGNKRGTGCC 

58 ATTRGTNMGTGCC 
5 9 ATTKRTGNGTGCC 

Each position of interest in the reference sequence is read bv 
ZtT" 9 hybridi2ati °" "tensities for the three 
that have an interrogation position aligned with tL 
nucleotide of interest in the reference sequence. For 
sample, to read the fourth nucleotide in the reference 
sequence, probes 55, 58 and 59 provide pools at the fourth 
position. Similarly, to read the fifth nucleotide in the 

"" renCe .! e9Ue " Ce - P " bes "< »« ■»» 5. Provide pools at the 
fifth position. As in the previous trellis strategy, one of 
the three probes being compared has an N at the pooled 
position ,„d the other two have „ or K, and (a , r C r V and ,„ 

taroer 6 "^"""i 0 " ° f «» «» P°°led probes to 

target sequences representing each possible nucleotide 
substitution at five positions in the reference sequence is 
shown below. Each possible substitution results iTT ■ 
hybridation pattern at three pooled ^ 
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Targets 

Wild : TAACCACTCACGGGAGCA 

Mutant : TAAgCACTCACGGGAGCA 

Mutant : TAAtCACTCACGGGAGCA 

Mutant : TAAaCACTCACGGGAGCA 

Mutant : TAACgACTCACGGGAGCA 

Mutant : TAACtACTCACGGGAGCA 

Mutant : TAACaACTCACGGGAGCA 

Mutant : TAACCcCTCACGGGAGCA 

Mutant : TAACCgCTCACGGGAGCA 

Mutant : TAACCtCTCACGGGAGCA 

Mutant ; TAACCAgTCACGGG AG CA 

Mutant : TAACCAtTCACGGGAGCA 

Mutant : TAACCAaTC ACGGG AG C A 

Mutant : TAACCACaCACGGGAGCA 

Mutant: T AACCACcCACGGG AG C A 







Pools 






55 


56 


57 


58 


59 


Y 


Y 


Y 


Y 


Y 


Y 


N 


N 


N 


N 


Y 


N 


N 
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N 
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N 


N 
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N 
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N 
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Y 


N 


N 


Y 


N 


Y 


N 


N 


N 


Y 


Y 


N 


N 


N 


N 


N 


Y 


N 


N 


Y 


N 


Y 


N 


N 


N 


Y 


Y 
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Mutant : 



TAACCACgCACGGGAGCA 
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Many variations on the loop and trellis tilings can be 
created. All that is required is that each position in 
sequence must have a probe with a 'N', a probe containing one 
of R/Y, M/K or W/S, and a probe containing a different pool 
from that set, complementary to the wild type target at that 
position, and at least one probe with no pool at all at that 
position. This combination allows all mutations at that 
position to be uniquely detected and identified. 

A further class of strategies involving pooled probes are 
termed coding strategies. These strategies assign code words 
from some set of numbers to variants of a reference sequence. 
Any number of variants can be coded. The variants can include 
multiple closely spaced substitutions, deletions or 
insertions. The designation letters or other symbols assigned 
to each variant may be any arbitrary set of numbers, in any 
order. For example, a binary code is often used, but codes to 
other bases are entirely feasible. The numbers are often 
assigned such that each variant has a designation having at 
least one digit and at least one nonzero value for that digit. 
For example, in a binary system, a variant assigned the number 
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101, has a designation of three digits with nno 
nonzero value for each digit. ^ P ° SSlble 

The designation of the va 

value of each digit in t he „ J e „ 
For example, if the variants are ,./ ian " 

a numbering system ot base , ^ "''"TV"*"" 1 " """"" " 
to a variant has n digits tte \! * ^ " U, " ber assi '" ei3 

to a„al y2 e all variant, of N locates In a 7eference 

r . P oo le a „ -: n r ; r bi ~ ea uith 

Each pooled prote has a segment exactlv r- , 
reference seguence except t^rcerta"" ! *° *"* 

The segment should b e sufficient!, Ion t " "* P °° le,i ' 

hybridation of the pooled ^J^lJ^Zr* 1 * 
relative to a stated for- of the reference 

nucleotides that allow the pooled p'robe .l l^lTZT 

null P °° led f 031 " 0 ^ father comprises a 

nucleotide that allows the pooled probe to hybridLe L t„ 
reference seguence. Thus, a wildtype target <or ref 

.oois^istr: :: ~^;::; n rr s - — 

exactly complementary to the target ^It " " 
the target is then decoded from L ete'r ? of " 
Pools. Each pool that lights up is correlated with f ' 
particular value in a particular digit. Thus the \ 
hybridization patterns of each lighting ZT'^T^T 
of each digit in the code defining the identit t^e t ! 
hybridized to the array. Y he tar 9 et 
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As an example, consider a reference sequence having four 
positions, each of which can be occupied by three possible 
mutations. Thus, in total there are 4 x 3 possible varxant 
forms of the reference sequence. Each variant is assigned a 
binary number binary numbers 0001-1100 and the wildtype 
reference sequence is assigned the binary number 1111. 



X 



xxx 



Positions - - 1 t=iiii 

Target: TAAC C=11H A=11H C-llll T-lili 

CACGGGAGCA A=0100 

T=0101 G=0110 T-0111 C=1000 

A=1001 T-1010 A-1011 G=1100 

A first pooled probe is designed by including probes that 

complement exactly each variant having a 1 in the first digit. 



20 



25 



30 



target (1111) : TAAC 

Mutant (0001) : TAAC 

Mutant ( 0101) : TAAC 

Mutant(lOOl) : TAAC 

Mutant(OOll) : TAAC 

Mutant (0111) : TAAC 

Mutant (1101) : TAAC 

First pooled probe 
ATTG 
ATTG 



C 
C 
C 



[GCAT] 
N 



C 
C 
C 
C 



A 
A 
A 
A 
A 
A 
A 



T [GCAT] 
T N 



T CACGGGAGCA 
T CACGGGAGCA 
T CACGGGAGCA 
T CACGGGAGCA 
T CACGGGAGCA 
T CACGGGAGCA 
T CACGGGAGCA 



A GTGCCC 
A GTGCCC 



35 



40 



Second third and fourth pooled probes are then designed 
respectively including component probes that hybridize to each 
variant having a 1 in the second, third and fourth digit. 

XXXX - 4 positions examined 



Target: 
Pool 1(1) : 
Pool 2 (2) : 
Pool 3(4): 
Pool 4(8) : 



TAACCACTCACGGGAGCA 
ATTGnTnAGTGCCC = 
ATTGGnnAGTGCCC = 
ATTGyrydGTGCCC = 
ATTGmwmbGTG CCC = 



16 probes 

16 probes 

2 4 probes 

2 4 probes 



(4X1X4X1) 
(1x4x4x1) 
(2X2X2X3 ) 
(2x2x2x3 ) 
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The pooled probes hybridize to variant targets as follow.: 
Hybridization pattern: 



Wild(llii) 
Mutant (0001) 
Mutant (0101) 
Mutant (1001) 

Mutant (0010) : 
Mutant (0110) : 
Mutant(ioio) : 

Mutant (0011) : 
Mutant(Olll) : 
Mutant (1101) : 

Mutant (0100) : 
Mutant (1000) : 
Mutant (1100) ; 



Targets 
TAACCACTCACGGGAGCA 
TAACgACTCACGGGAGCA 
TAACtACTCACGGGAGCA 
TAACaACTCACGGGAGCA 

TAACCcCTCACGGGAGCA 
TAACCgCTCACGGGAGCA 
TAACCtCTCACGGGAGCA 

TAACCAgTCACGGGAGCA 
TAACCAtTCACGGGAGCA 
TAACCAaTCACGGGAGCA 

TAACCACaCACGGGAGCA 
TAACCACcCACGGGAGCA 
TAACCACgCACGGGAGCA 





Pools 




1 
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Y 


y 


y 


N 


N 


N 


Y 
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N 
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N 


Y 


N 


Y 


N 


N 


N 


Y 


Y 


N 


N 


Y 


N 


Y 


y 


Y 


N 


N 


Y 


Y 
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N 


N 


Y 


N 


N 


N 


N 


Y 


N 


N 


Y 


Y 



The identity of a variant (i.e., mutant) target is read 

For el 1 "!; 0 :"" 6 r^^^ - th. poded probes. 

For example the mutant assigned the number 0001 g i ves a 

hybrxdization pattern of NNNY with respect to probes 4 3 2 
and l respectively. 

numb ^ abOVS eXamPle ' V3riantS a " aSSi ^ ned — essive 

numbers xn a numbering system. m other embodiments, sets of 
numbers can be chosen for their properties. Xf the codeword 
are chosen from an error-control code, the properties of that 
code carry over to sequence analysis. An error code is a 
numbering system in which some designations are assigned to 
variants and other designations serve to indicate errors that 
»y have occurred in the hybridization process. For example, 
xf all codewords have an odd number of nonzero digits (-binary 
cod ng + error detection-), any single error in hybridiz tion 
will be detected by having an even number of pools lit. 



wild 
Target: 

Pool 1(1) 
Pool 2(2) 
Pool 3(4) 
Pool 4(8) 



TAACCACTCACGGGAGCA 

ATTGnAnAGTGCCC = 
ATTGGnnAGTGCCC = 
ATTGryrhGTGCCC = 
ATTGkwkvGTGCCC = 



16 Probes 
16 Probes 
24 Probes 
24 Probes 



(4x1x4x1) 
(1X4X4X1) 
(2X2X2X3) 
(2X2X2X3) 
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A fifth probe can be added to make the number of pools that 
hybridize to any single mutation odd. 



Pool 5(c): ATTGdhsmGTGCCC = 36 probes 
Hybridization of pooled probes to targets 



10 Target (11111) : 
Mutant (00001) : 
Mutant (10101) i 
Mutant (11001) 

15 Mutant (00010) 
Mutant (10110) 
Mutant (11010) 

Mutant (10011) 
20 Mutant (00111) 
Mutant (01101) 



Target 
TAACCACTCACGGGAGCA 
TAACgACTCACGGGAGCA 
TAACtACTCACGGGAGCA 
TAACaACTCACGGGAGCA 

TAACCcCTCACGGGAGCA 
TAACCgCTCACGGGAGCA 
TAACCtCTCACGGGAGCA 

TAACCAgTCACGGGAGCA 

TAACAtTCACGGGAGCA 

TAACCAaTCACGGGAGCA 



(2x2x3x3) 







Pool 
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Y 
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Mutant (00100) 
Mutant (01000) 
Mutant (11100) 



TAACCACaCACGGGAGCA 
TAACCAcCCACGGGAGCA 
TAACCACgCACGGGAGCA 
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g. Bridg ing strategy 

Probes that contain partial matches to two separate 
(i.e., non contiguous) subsequences of a target sequence 
sometimes hybridize strongly to the target sequence. In 
-wrfv<Kii 'xnU-cmnnn., -awiw inrtfisw. hiuv* -ywMwtvsd. -JvnruKgw. ^rgirJr- 
than probes of the same length which are perfect matches to 
the target sequence. It is believed (but not necessary to the 
invention) that this observation results from interactions of 
a single target sequence with two or more probes 
simultaneously. This invention exploits this observation to 
provide arrays of probes having at least first and second 
segments, which are respectively complementary to first and 
second subsequences of a reference sequence. Optionally, the 
probes may have a third or more complementary segments. These 
probes can be employed in any of the strategies noted above. 
The two segments of such a probe can be complementary to 
disjoint subsequences of the reference sequences or contiguous 
subsequences. If the latter, the two segments in the probe 
inverted relative to the order of the complement of the 
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reference sequence. The two subsequences of the reference 
sequence each typically comprises about 3 to 3 0 contiguous 
nucleotides. The subsequences of the reference sequence are 
sometimes separated by o, i, 2 or 3 bases. often the 
sequences, are adjacent and nonoverlapping . 

For example, a wild-type probe is created by 
complementing two sections of a reference sequence (indicated 
by subscript and superscript) and reversing their order. The 
interrogation position is designated (*) and is apparent from 
comparison of the structure of the wildtype probe with the 
three mutant probes. The corresponding nucleotide in the 
reference sequence is the »a» in the superscripted segment. 

Reference: 5' T GGCTA CGAGG AATCATCTGTTA 

Probes: 3. GCTCC CCGAT (Probe from first probe set) 

3' GCACC CCGAT ' 

3 ' GCCCC CCGAT 

3 ' GCGCC CCGAT 

The expected hybridizations are: 
Match: 

GCTC CCCGAT 

... TGGCTACGAGGAATCATCTGTTA 
GCTCCCCGAT 

Mismatch: 

GCTC CCCGAT 

. . . TGGCTACGAGGAATCATCTGTTA 
SCGCCCCGAT 

Bridge tilings are specified using a notation which gives 
the length of the two constituent segments and the relative 
position of the interrogation position. The designation n/m 
indicates a segment complementary to a region of the reference 
sequence which extends for n bases and is located such that 
the interrogation position is in the nth base from the 5' end. 
If m is larger than n, this indicates that the entire segment' 
is to the 5- side of the interrogation position. if n is 
negative, it indicates that the interrogation position is the 
absolute value of m bases 5< of the first base of the segment 
(a cannot be zero). Probes comprising multiple segments, such 
as n/m + a/b + ... have a first segment at the 3- end of the 
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probe and additional segments added 5* with respect to the 
first segment- For example, a 4/8 tiling consists of (from 
the 3' end of the probe) a 4 base complementary segment, 
starting 7 bases 5' of the interrogation position, followed by 
5 a 6 base region in which the interrogation position is located 
at the third base. Between these two segments, one base from 
the reference sequence is omitted. By this notation, the set 
shown above is a 5/3 + 5/8 tiling. Many different tilings are 
possible with this method, since the lengths of both segments 
10 can be varied, as well as their relative position (they may be 
in either order and there may be a gap between them) and their 
location relative to the interrogation position. 

As an example, a 16 mer oligo target was hybridized to a 
chip containing all 4 10 probes of length 10- The chip 
15 includes short tilings of both standard and bridging types. 

The data from a standard 10/5 Liliny was compared to data from 

a 5/3 + 5/8 bridge tiling (see Table 1). Probe intensities 
(mean count/pixel) are displayed along with discrimination 
ratios (correct probe intensity / highest incorrect probe 
intensity). Missing intensity values are less than 50 counts. 
Note that for each base displayed the bridge tiling has a 
higher discrimination value. 

TABLE 1: Comparison of Standard and Bridge Tilings 
TILING PROBE BASE: CORRECT PROBE BASE 



20 



25 



30 



35 



STANDARD C 
(10/5) G 



T 



DISCRIMINATION: 



A 

BRIDGING c 
40 5/3 + 5/8 G 



DISCRIMINATION: 



c 


A 


c 


C 


a?- 








536 


148 


532 


534 


69 


167 


72 


52 


146 


95 


212 


126 


3.7 


3.0 


1.8 


1.8 




404 




156 


276 




345 


379 




80 












58 


>5.5 


5.1 


2.4 


1.26 



45 



The bridging strategy offers the following advantages: 
(1) Higher discrimination between matched and mismatched 
probes, 
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tili„ t ! 2) tf , Th \ POSSibillty ^ USi " 9 l0n9er P " b - "ridging 
tiling, thereby increasing the specificity of the 

hybridization, without sacrificing discrimination 

(3) The use of probes in which an interrogation position 
is located very off-center relative to the regions of target 
complementarity. This Bay be of pa.ticuiar advantage when 
for example, when a probe centered about one region of the 
target gives low hybridization signal. Th e low signal 2 
overcome by using a probe centered about an adjoining region 
giving a higher hybridization signal. 3 

(4) Disruption of secondary structure that might result 
in annealing of certain probes (see previous discussion of 
helper mutations) . 



Afi-t Dele-ti nn Til i ng 

hem DSle ! i0n " rel3ted t0 b ° th the bribing and 

helper mu tant strategies described above. ln the J let±on 

strategy, comparisons are performed between probes sharing a 
common deletion but differing from each other at an 
interrogation position located outside the deletion. For 
-ample, a first probe comprises first and second segments 
each exactly complementary to respective fi rst and second 
subsequences of a reference sequence, wherein the fi rst and 
second subsequences of the reference sequence are separated bv 
a short distance x or 2 nucleotides) . The ~~ * 

first and second segments in the probe is usually the same T 
that of the co.ple.ent to the f irst and second Z b 2lZl in 
the reference sequence. The interrogation position Is us^lly 

Probes, wh.ch are identical to the first probe except at an 
.interrogation position, which is different in each probe 
Reference:, . . AGTACCAGATCTCTAA 

SuTVT CATGGNC AGAGA (N ='-terrogatio n position) 

Such txl ln gs sometimes offer superior dissipation in ' 
hybridization intensities between the probe having an 
interrogation position complementary to the target and other 
Probes. Thermodynamically, the difference between the 
hybridizations to notched and mismatched targets for the probe 
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set shown above is the difference between a single-base bulge, 
and a large ' asymmetric loop (e.g., two bases of target, one of 
probe) . This often results in a larger difference in 
stability than the comparison of a perfectly matched probe 
5 with a probe showing a single base mismatch in the basic 

tiling strategy. 

The superior discrimination offered by deletion tiling is 
illustrated by Table 2, which compares hybridization data from 
a standard 10/5 tiling with a (4/8 + 6/3) deletion tiling of 

10 the reference sequence. (The numerators indicate the length 
of the segments and the denominators, the spacing of the 
deletion from the far termini of the segments.) Probe 
intensities (mean count/pixel) are displayed along with 
discrimination ratios (correct probe intensity / highest 

15 incorrect probe intensity) . Note that for each base displayed 
the deletion tiling has a higher discrimination value Lhai. 
either standard tiling shown. 

TABLE 2. Comparison of Standard and Deletion Tilings 
20 TILING PROBE BASE: CORRECT PROBE BASE 



25 



35 



45 



A 

STANDARD c 
(10/5) C 

T 



3 0 DISCRIMINATION : 



DELETION 
4/8 + 6/3 



DISCRIMINATION: 



A 
C 
G 
T 



A 

4 0 STANDARD C 

(10/7) G 

T 



DISCRIMINATION: 



c 


A 


C 


C 


92 


496 


294 


299 


536 


148 


532 


534 


69 


167 


72 


52 


146 


95 


212 


126 


3.7 


3.0 


1.8 


1.8 


6 


412 


29 


48 


297 


32 


465 


160 


8 


77 


10 


4 


8 


26 


31 


5 


37.1 


5.4 


15 


3.3 


347 


533 


228 


277 


729 


194 


536 


496 


232 


231 


102 


89 


344 


133 


163 


150 


2.1 


2.3 


2.3 


1.8 



50 



These probes can be used in any of the tiling strategies of 
the invention. As well as offering superior discrimination, 
the use of deletion or bridging strategies is advantageous for 
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certain probes to avoid self -hybridization (either within a 
probe or between two probes of the same sequence) 

^ P reparation of Target Samp ioe 

The target polynucleotide, whose sequence is to be 
determined, is usually isolated from a tissue sample. if the 
target is genomic, the sample may be from any tissue (except 
exclusively red blood cells) . For example, whole blood, 
peripheral blood lymphocytes or PBMC, skin, hair or semen are 
convenient sources of clinical samples. These sources are 
also suitable if the target is RNA. Blood and other body 
fluids are also a convenient source for isolating viral 
nucleic acids. If the target is mRNA, the sample is obtained 
from a tissue in which the mRNA is expressed. if the 
polynucleotide in the sample is RNA, it is usually reverse 
transcribed to DNA. DNA samples or. cDNA resulting from 
reverse transcription are usually amplified, e.g., by PGR 
Depending on the selection of primers and amplifying 
enzyme(s), the amplification product can be RNA or DNA. 
Paired primers are selected to flank the borders of a target 
polynucleotide of interest. More than one target can be 
simultaneously amplified by multiplex PGR in which multiple 
paired primers are employed. The target can be labelled at 
one or more nucleotides during or after amplification. For 
some target polynucleotides (depending on size of sample) , 
e.g., episomal DNA, sufficient DNA is present in the tissue 
sample to dispense with the amplification step. 

When the target strand is prepared in single-stranded 
form as in preparation of target RNA, the sense of the strand 
should of course be complementary to that of the probes on the 
chip. This is achieved by appropriate selection of primers. 
The target is preferably fragmented before application to the 
chip to reduce or eliminate the formation of secondary 
structures in the target. The average size of targets 
segments following hybridization is usually larger than the 
size of probe on the chip. 
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II'. ILLUSTRATIVE CHIPS 
A. HIV Chip 

HIV has infected a large and expanding number of people, 
resulting in massive health care expenditures. HIV can 
5 rapidly become resistant to drugs used to treat the infection, 
primarily due to the action of the heterodimeric protein (51 
kDa and 66 kDa) HIV reverse transcriptase (RT) both subunits 
of which are encoded by the 1 . 7 kb pol gene. The high error 
rate (5-10 per round) of the RT protein is believed to account 

10 for the hypermutability of HIV. The nucleoside analogues, 
i.e., AZT, ddl, ddC, and d4T, commonly used to treat HIV 
infection are converted to nucleotide analogues by sequential 
phosphorylation in the cytoplasm of infected cells , where 
incorporation of the analogue into the viral DNA results in 

15 termination of viral replication, because the 5* -> 3 ■ 

phosphodiester linkage cannot be completed. However, arter 
about 6 months to 1 year of treatment or less, HIV typically 
mutates the RT gene so as to become incapable of incorporating 
the analogue and so resistant to treatment. Several mutations 

2 0 known to be associated with drug resistance are shown in the 
table below. After a virus having drug resistance via a 
mutation becomes predominant, the patient suffers dramatically 
increased viral load, worsening symptoms (typically more 
frequent and dif f icult-to-treat infections) , and ultimately 

25 death. Switching to a different treatment regimen as soon as 
a resistant mutant virus takes hold may be an important step 
in patient management which prolongs patient life and reduces 
morbidity during life. 
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».».. ether stations confer resistance to other drugs _ 



A second important therapeutic tar no f * 
Whose runction is required £or £or „ a J o ; h o e f H ^ f ~' 
Prooeny. see Kevins S Plat t„er. ,. guired L u l ^ 

op. Infect. Vi.. 7i72 . sl (1994) T " Cu "- 

The protease function in 
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processing of viral precursor polypeptides to their active 
forms Drugs targeted against this enzyme do not impair 
endogenous human proteases, thereby achieving a high degree of 
selective toxicity. Moreover, the protease is expressed later 
5 in the life-cycle that reverse transcriptase, thereby offering 
the possibility of a combined attack on HIV at two different 
times in its life-cycle. As for drugs targeted against the 
reverse transcriptase, administration of drugs to the protease 
can result in acquisition of drug resistance through mutation 
10 of the protease. By monitoring the protease gene from 
patients, it is possible to detect the occurrence of 
mutations, and thereby make appropriate adjustments in the 
drug(s) being administered. 

in addition to being infected with HIV, AIDS patients are 
T> often also infected with a wide variety of other in£ect ^ ous 
agents giving rise to a complex series of symptoms. ux^en 
diagnosis and treatment is difficult because many different 
pathogens (some life-threatening, others routine) cause 
similar symptoms. Some of these infections, so-called 
20 opportunistic infections, are caused by bacterial, fungal, 
protozoan or viral pathogens which are normally present m 
small quantity in the body, but are held in check by the 
immune system. When the immune system in AIDS patients fails, 
these normally latent pathogens can grow and generate rampant 
25 infection. In treating such patients, it would be desirable 
simultaneously to diagnose the presence or absence of a 
variety of the most lethal common infections, determine the 
most effective therapeutic regime against the HIV virus, and 
monitor the overall status of the patient's infection. 
3 0 The present invention provides DNA chips for detecting 

the multiple mutations in HIV genes associated with resistance 
to different therapeutics. These DNA chips allow physicians 
to monitor mutations over time and to change therapeutics if 
resistance develops. Some chips also provide probes for 
35 diagnosis of pathogenic microorganisms that typically occur in 
AIDS patients. 

The sequence selected as a reference sequence can be from 
anywhere in the HIV genome, but should preferably cover a 
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region of the HIV genome in which *. • 

ar ug resistMce are knoun to X *: X " SOCi " ed " ith 

usually between about s 10 »» ■ 8ra,ce sequence is 

10,000 bases in length, 'and ' ^rl^' 5 ° 00 ' 5 '°°° ~ 

■ ' in length. See reference S I " " ^ 10 <>-"°° "«« 

Of the reverse transcriptase -compass at least part 

preferably, the refer.nc ^se gu sT"" '""^ * "» ^ «— • 
substantially all (i ""/^ ^compasses all, or 

transcriptase' gene a™ " " ^ " ™* 
« -veral drugs and as ^11 ^TlTlZ ^ ^ °* 

site o f many „ utations issKUi ;^ "Suence is the 

SO " 6 = hi ^- ^e reference sequence contains r"^' In 
coding reverse transcriptase ,850 b T 7 region 
subfragments thereof L ' l " ° ther chi P*. 

» includes other su^fragme t Hf ~™ 
Please or en.onucle.se. instead of 0^ w™" 9 
segment encoding reverse transcriptase r! ! " * 
reference seouence also includes othe" Hrv "e„ ^ ^ 

or gag as well as or instead of «, SUCh as env 

« iene. Certain regions of the """"^ « 

well conserved, and ^ are relatively 

identifying and quantifying ^Z^T^JT" 
a patient. I„ some Chios th. „ VlrUS Meeting 

entire Hlv 5eno » e . ChlPS ' r «-e„=e sequence comprises an 

se^\^^:::" H ~- ^ « the reference 

are several strains and generlc 9™ u Pi->9s there 

« sequences of which are evailabL^ strains, the 

transcriptase genes of « .".J™ fl «» ™* 

nucleotides. The HXB2 and HXB2R sj '"^ " 23 

transcriptase gene sequence „hi« Tdlf7 ^ "~ 
Bm, strain at four nucleotide TT M t T" "°" ° f 

« nucleotides. In some chips the "f" " * " 

corresponds exactlv reference sequence 

the wiLtype ver n of a s^n" 6 J™"* 1 *" ~e in 
— sequence correspond .l^*^ „ 
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several HIV strains. In some chips. the reference seance 
corresponds to a mutant form of a HIV .train. 

clips are designed in accordance with the tiling 
strategies noted above. The probes are designed to be 
s complementary to either the coding or 

HIV reference sequence. If only one strand is to be read, 
is preferable to read the coding strand. The greater 
Percentage of A residues in this strand relative to the 
'encoding strand generally result in fever regions of 

10 aBb To: S e cnlpHontain additional probes or groups of probes 
designed to be complementary to a second reference sequence. 
Th e second reference sequence is often a subsequence of the 
first reference sequence bearing one or more commonly 

„ occurring HIV mutations or interstrain variat i ons ( eg 

: , „« ™ — 215 or 215 of the leverae transcr i ptase 

within codons «. 70. 21 p „ticularly useful 

aene) . The inclusion of a secona gr« v 

for analyzing short subsequences of the primary reference 
sequence in which multiple mutations are expected to occur 
2 „ within a short distance commensurate with the length of the 
probes (i.e.. two or more mutations within . to » >—•>• 

The total number of probes on the chips depends on the 
tiling strategy, the length of the reference sequence and the 
"tions selected with respect to inclusion of multiple probe 

of The existence of common mutations. To read much or all of 
the HIV reverse transcriptase gene ,..7 b for the BKU strain, , 
chips tiled by the basic strategy typically contain at least 

857 x 4 = 3428 probes. 
30 The target HIV polynucleotide, whose sequence is to be 

determined, is usually isolated fro, blood sables ^^r,l 
bl ood lymphocytes or PBMC) in the for, of BHA then 
reverse transcribed, to DNA, and the DNA product is then 
amplified. Depending on the selection of primers and 
35 amplifying enzyme, the amplification product can be KNA or 

Zl SuiLble primers for amplification of target are shown 
in the table below. 
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TABLE 4 
AMPLIFICATION OF TARGET 



TARGET 
SIZE 


| FORWARD PRJMER 


j REVERSE PRJMER 


1,742 bp 
535 bp 


GTAGAATTCTGTTGACTCAGATTGG 
AAATCCATACAATAC1 CCAGTATTTGC 


GATAAGCTTGGGCCTTATCTATTCCAT 
ACCCATCCAAAGGAATGGAGGTTCTTTC " 


323 bp 


Gcnbank # K02013 1889-1908 
AATTAACCCTCACTAAAGGGAga 
ggaagaatctgttgactcagattggt (RT#J-T3) 


bases 2211-2192 

AAl i 1 AATACGACTCACTATAGGGAtttcccca 
ctaacttctgtatgtcattgaca-3 ' (89-391 T7) 




AATTAACCCTCACTAAAGGGAga 
agtatactgcattaccatacctagta (RT#3-T3) 






I AATACGACTCACTA'J AGUGAGA 
tcgacgcaggactcggcttgctgaa (HV1-T2) 






AATTAACCCTCACTAAAGGGAGA 
ccttgtaagtcattggtcttaaaggta (HV2-T3) 





In another aspect of the invention, chips are provided 
for simultaneous detection of HIV and microorganisms that 
commonly parasitize AIDS patients (e.g., cytomegalovirus 
(CMV) , Pneumocystis carini (PCP) , fungi (Candida albicans) 
mycobacteria) . Non-HIV viral pathogens are detected and their 
drug resistance determined using a similar strategy as for 
HIV. That is groups of probes are designed to show 
complementarity to a target sequence from a region of the 
genome of a nonviral pathogen known to be associated with 
acquisition of drug resistance. For example, CMV and HSV 
vxruses, which frequently co-parasitize AIDS patients, undergo 
mutations to acquire resistance to acyclovir. 

For detection of non-viral pathogens, the chips include 
an array of probes which allow full-sequence determination of 
16S ribosomal RNA or corresponding genomic DNA of the 
pathogens. The additional probes are designed by the same 
principles as described above except that the target sequence 
is a variable region from a 16S RNA (or corresponding DNA) of 
a pathogenic microorganism. Alternatively, the target 
sequence can be a consensus sequences of variable 16S rRNA 
regions from multiple organisms. i 6S ribosomal DNA and RNA is 
present in all organisms (except viruses) and the sequence of 
the DNA or RNA is closely related to the evolutionary genetic 
distance between any two species. Hence, organisms which are 
quite close in type (e.g., all mycobacteria) share a common 
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region of 16S rDNA, and differ in other regions (variable 
regions) of the 16S rRNA. These differences can be exploited 
to allow identification of the different subtype strains. The 
full sequence of 16S ribosomal RNA or DNA read from the chip 
5 is compared against a database of the sequence of thousands of 
known pathogens to type unambiguously most nonviral pathogens 
infecting AIDS patients. 

In a further embodiment, the invention provides chips 
which also contain probes for detection of bacterial genes 

10 conferring antibiotic resistance. An antibiotic resistance 
gene can be detected by hybridization to a single probe 
employed in a reverse dot blot format. Alternatively, a group 
of probes can be designed according to the same principles 
discussed above to read all or part the DNA sequence encoding 

15 an antibiotic resistance gene. Analogous probes groups are 

designed for reading other antibiotic resistance yene 

sequences. Antibiotic resistance frequently resides in one of 
the following genes in microorganisms coparasitizing AIDS 
patients: rpoB (encoding RNA polymerase), katG (encoding 

2 0 catalase peroxidase, and DNA gyrase A and B genes. 

The inclusion of probes for combinations of tests on a 
single chip simulates the clinical diagnosis tree that a 
physician would follow based on the presentation of a given 
syndrome which could be caused by any number of possible 

25 pathogens. Such chips allow identification of the presence 
and titer of HIV in a patient, identification of the HIV 
strain type and drug resistance, identification of 
opportunistic pathogens, and identification of the drug 
resistance of such pathogens. Thus, the physician is 

30 simultaneously apprised of the full spectrum of pathogens 
infecting the patient and the most effective treatments 
therefor . 
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Exemplary HTV rhi r < = OfcTjSlft 

The HV 273 chip contains an array of oligonucleotide 
probes for analysis of an 857 base HIV amplicon between 
nucleotides 2090 and 2946 (HXVBRU strain numbering). The chip 
contains four groups of probes: n mers, 13 mers, 15 mors and 
17 mers. From top to bottom, the HV 273 chip is occupied by 
rows of 11 mers, followed by rows of 13 mers, followed by rows 
of 15 mers followed by rows of 17 mers. The interrogation 
position is nucleotide 6, 7, 8 and 9 respectively in the 
deferent sized chips. This arrangement of the different 
sized probes is referred to as being "in series." within each 
size group, there are four probe sets laid down in an A-lane 
a C-lane a G-lane and a T-lane respectively. Each lane 
contains an overlapping series of probes with one probe for 
each nucleotide in the 2090-2946 HIV reverse transcriptase 
reference sequence, (i.e., 857 probes per lane). The lanes 
also include a few column positions which are empty or 
occupied by control probes. These positions serve to orient 
the chip, determine background fluorescence and punctuate 
different subsequences within the target. The chip has an area 
of 1.28 x 1.28 cm, within which the probes form a 130 X 135 
-tnx (17 , 5 3o cells total). The area occupied by each probe 
U-e., a probe cell) is about 98 X 95 microns 

The chip was tested for its capacity to sequence a 
reverse transcriptase fragment from the HIV strain SF2 An 
831 bp kna fragment (designated P Poll9) spanning most of the 
HIV reverse transcriptase coding sequence was amplified by 
PCR, using primers tagged with T3 and T7 promoter sequences 
The primers, designated RT #1-T3 and 89-391 T7 are shown in ' 

(T 9 l 3lS ° Gin9SraS ^ a1 -' ^ Inf - D±S - 164 ' 1°66- 10 74 

(1991) (incorporated by reference in its entirety for all 

purposes) . rna was labelled by incorporation of fluorescent 
nucleotides. The rna was fragmented by heating and hybridized 
to the chip for 40 min at 30 degrees. Hybridization signals 
were quantified by fluorescence imaging. 

Taking the best data from the four probes sets at each 
position in the target sequence, 715 out of 821 bases were 
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read correctly (87%) . (Comparisons are based on the sequence 
of pPoll9 determined by the conventional dideoxy method to be 
identical to SF2) . In general, the longer sized probes 
yielded more sequence than the shorter probes. Of the 21 
positions at which the SF2 and BRU strains diverged within the 
target, 19 were read correctly. 

Many of the short ambiguous regions in the target arise 
in segments of the target flanking the points at which the SF2 
and BRU sequences diverge. These ambiguities arise because in 
these regions the comparison of hybridization signals is not 
drawn between perfectly matched and single base mismatch 
probes but between a single-mismatched probe and three probes 
having two mismatches. These ambiguities in reading an SF2 
sequence would not detract from the chip's ability to read a 
BRU sequence either alone or in a mixture with an SF2 target 
sequence. 

In a variation of the above procedure, the chip was 
treated with RNase after hybridization of the pPo!19 target to 
the probes. Addition of RNase digests mismatched target and 
2 0 thereby increases the signal to noise ratio. RNase treatment 
increased the number of correctly read bases to 743/821 or 90% 
(combining the data from the four groups of probes) . 

In a further variation, the RNA target was replaced with 
a DNA target containing the same segment of the HIV genome. 
25 The DNA probe was prepared by linear amplification using Taq 
polymerase, RT#1-T3 primer, and fluorescein d-UTP label. The 
TO n -pr-0rue -WcfB J£ rr^gm^ntreti - wttzi 'uracfi "DNa giycosylase and heat 
treatment. The hybridization pattern across the array and 
percentage of readable sequence were similar to those obtained 
3 0 using an RNA target. However, there were a few regions of 

sequence that could be read from the RNA target that could not 
be read from the DNA target and vice versa. 



10 



(hi HV 407 Chip 
35 The 4 07 chip was designed according to the same 

principles as the HV 273 chip, but differs in several 
respects. First, the oligonucleotide probes on this chip are 
designed to exhibit perfect sequence identity (with the 
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6P10n ° f ^"rogation position on each probe) to the 

HIV strain SF2 (rather than the BRU strain as was the case for 
the HV 273 chip). second, the 407 chip contains 13 mers 15 
mers, 17 mers and 19 mers (with interrogation positions at 
nucleotide 7, 8> 9 and 10 respectively), rather than the 11 

the different sized groups of oligomers are arranged in 
parallel in place of the in-series arrangement on the HV 273 
chip m the parallel arrangement, the chip contains from top 
to bottom a row of 13 mers, a row of 15 mers, a row of 17 
B6rS ' " r ° W ° f 19 merS ' snowed by a further row of 13 mers 
a row of is mers, a row of 17 mers, a row of 19 mers, followed 
by a row of 13 mers, and so forth. Each row contains 4 lanes 
of probes, an A lane, a c lane, a G lane and a T lane as 
described above. The probes in each lane tile across 't he 
reference sequence. The layout of probes on the HV 407 chip is 
shown in Pig. 10. 

The 407 chip was separately tested for its ability to 
sequence two targets, P Poll9 rna and 4MUT18 RNA. pP oli9 
contains an 831 bp fragment from the SF2 reverse transcriptase 
gene whach exhibits perfect complementarity to the probes on 
the 4 07 chip (except of course for the interrogation positions 
" three ° f thS Pr ° beS in each column) . 4MUT18 differs from 
the reference sequence at thirty-one positions within the 
target, including five positions in codons 67, 70, 215 and 219 
associated with acquisition of drug resistance. Target RNA 
was prepared, labelled and fragmented as described above and 
hybridized to the HV 407 chip. The hybridization pattern for 
the P Poll9 target is shown in Fig. 11. 

The sequences read off the chip for the p P oll9 and 4MUT18 
targets are both shown in Fig. „ (although the ^ seguences 
were determined m different experiments) . The sequence 
labelled wildtype in the Figure is the reference sequence 
The four lanes of sequence immediately below the reference 
sequence are the respective sequences read from the four-sized 
groups of probes for the P Poll9 target (from top-to-bottom 13 
mers, 15 mers, 17 mers and 19 mers). The next four lanes of 
sequence are the seguences read from the four-sized groups of 
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probes for the 4MUT18 target (from top-to-bottom in the same 
order) . The regions of sequences shown in normal type are 
those that could be read unambiguously from the chip. Regions 
where sequence could not be accurately read are shown 
5 highlighted. Some regions of sequence that could not be read 
from one sized set of probes could be read from another. 

Taking the best result from the four sized groups of 
probes at each column position, about 97% of bases in the 
pPoll9 sequence and about 90% of bases in the 4MUT18 sequence 

10 were read accurately. Of the 31 nucleotide differences 

between 4MUT18 and the reference sequence, twenty-seven were 
read correctly including three of the nucleotide changes 
associated with acquisition of drug resistance. Of the 
ambiguous regions in the 4MUT18 sequence determination, most 

15 occurred in the 4MUT18 segments flanking points of divergence 

between the 4MUT18 and reference sequences. Notably, must of — 

the common mutations in HIV reverse transcriptase associated 
with drug resistance (see Table 3) occur at sequence positions 
that can be read from the chip. Thus, most of the commonly 

2 0 occurring mutations can be detected by a chip containing an 
array of probes based on a single reference sequence. 

Comparison of the sequence read of the probes of 
different sizes is useful in determining the optimum size 
probe to use for different regions of the target. The 

2 5 strategy of customizing probe length within a single group of 

probe sets minimizes the total number of probes required to 
read a particular target sequence. This leaves ample capacity 
for the chip to include probes to other reference sequences 
(e.g., 16S RNA for pathogenic microorganisms) as discussed 

3 0 below. 

The HV 4 07 chip has also been tested for its capacity to 
detect mixtures of different HIV strains. The mixture 
comprises varying proportions of two target sequences; one a 
segment of a reverse transcriptase gene from a wildtype SF2 
3 5 strain, the other a corresponding segment from an SF2 strain 
bearing a codon 67 mutation. See Fig. 13. The Figure also 
represents the probes on the chip having. an interrogation 
position for reading the nucleotide in which the mutation 
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occurs. A single probe in the Figure represents four probes 
on the chip with the symbol (o) indicating the interrogation 
position, which differs in each of the four probes. Figure 14 
shows the fluorescence intensity for the four 13 mers and the 
four 15 mers having an interrogation position for reading the 
nucleotide in the target sequence in which the mutation 
occurs. As the percentage of mutant target is increase, the 
fluorescence intensity of the probe exhibiting perfect 
complementarity to the wildtype target decreases, and the 
intensity of the probe exhibiting perfect complementarity to 
the mutant sequence increases. The intensities of the other 
two probes do not change appreciably. it is concluded that 
the chip can be used to analyze simultaneously a mixture of 
strains, and that a strain comprising as little as ten percent 
of a mixture can be easily detected. 

c. Protease Chip 

A protease chip was constructed using the basic tiling 
strategy. The chip comprises four probes tiling across a 382 
nucleotide span including 297 nucleotides from the protease 
coding sequence. The reference sequence was a consensus Clay- 
B HIV protease sequence. Different probes lengths were 
employed for tiling different regions of the reference 
sequence. Probe lengths were 11, 14, 17 and 20 nucleotides 
with interrogation positions at or adjacent to the center of 
each probe. Lengths were optimized from prior hybridization 
data employing a chip having multiple tilings, each with a 
different probe length. 

The chip was hybridized to four different single-stranded 
DNA protease target sequences (HXB2, SF2 , NY5, P Pol4mutl8) . 
Both sense and antisense strands were sequenced. Data from 
the chip was compared with that from an ABI sequencer. The 
overall accuracy from sequencing the four targets is 
illustrated in the Table 5 below. 
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No call 
Ambiguous 
Wrong call 

TOTAL 



Sense 

0 
6 
2 
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Table 5 



ABI 



Antisense 
4 

14 
3 

21 
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Protease Chip 



Sense 
9 

17 

3 

29 



Antisense 

4 
8 
1 

13 



10 



15 



ABI (sense) - 99.5% 
Chip (sense) - 98.1% 

ABI (antisense) - 98.6% 
Chip (antisense) - 99.1% 



20 Combining the data from sense and antisense strands, both the 
chip and the ABI sequencer provided 100% accurate data for all 

: of the sequence from all four clones, 

In a further test, the chip was hybridized to protease 
target sequences from viral isolates obtained from four 

25 patients before and after ddl treatment. The sequence read 
from the chip is shown in Fig. 15. Several mutations 
(indicated by arrows) have arisen in the samples obtained 
posttreatment. Particularly noteworthy was the chip's 
capacity to read a g/a mutation at nucleotide 207, 

3 0 notwithstanding the presence of two additional mutations (gt) 
at adjacent positions. 



B. Cystic Fibrosis Chips 

A number of years ago, cystic fibrosis, the most common 
35 severe autosomal recessive disorder in humans, was shown to be 
associated with mutations in a gene thereafter named the 
Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) 
gene. The CFTR gene is about 250 kb in size and has 27 exons. 
Wildtype genomic sequence is available for all exonic regions 
40 and exons/intron boundaries (Zielenski et al., Genomics 10, 
214-228 (1991) . The full-length wildtype cDNA sequence has 
also been described (see Riordan et al., Science 245, 1059- 
1065 (1989). Over 400 mutations have been mapped (see Tsui et 
al, Hu. Mutat.' 1, 197-203 (1992). Many of the more common 
4 5 mutations are shown in Table 6. The most common cystic 
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TCTCCTTGGATATACTTGTGTGAATCAA 
TCACCAGATTTCGTAGTCTTTTCATA 
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OLIGO NUMBER 



784 



785 



791 



792 



1013 



SEQUENCE 



AATTGTGAAATTGTCTGCCATTCTTAA 



GATTCACTTACTGAACACAGTCTAAC 
AGGCTTCTCAGTGATCTGTTG 



1012 
766 



1065 




GTACCTGTTGCTCCAGGTATGTT 



Other primers can be readily devised fro™ «.„ 
genomic and cdna sequences of CFTR T ° kB ° Wl 
Priors, of course , depends Qn th T e R - area T s he of S ^- o £ 
sequence that are to be screened. Ths ° ***** 
depends on the strand to be amplified ^ °' ******* ^ 
the CFTR gene, it makes little difJ" **** ^V 1 ™* ° f 

signal whether the coding or Ll ^ ~-ation 
other regions, one strand may give belt ! ""^ In 

hybridization signals between ITh ! dlsc rimination in 

than the other. The u J« IL > ^ ^ - 1 ™ P-bes 
-t can be amplified!^ o^ " 9 ™ 

«>• Thus, for analysis of mutants th rough ^ ^ 50 

CFTR gene, it is often desirable «. 911 or »"* of the 

-om several paired pr^r^.^ 11 ^ ™ ~ 
amplified sequentially or simultaneously^ 'I***** *** °* 
-equently, fifteen or more segments o ^ ^ Pl " ^ 
simultaneously amplified by PC R th 9606 * re 

amplifications conditions a're prefe^sT T 
DNA targets. An asymmetric labelling stra *° 
fluorescently labelled dNTPs for rani lnc °r P orating 
-rget fragmentation to an .vLrj^L 1 ^" 111 * * OT *~ 

bases is preferred. The use of d^TP and \ ^ " 

or dUTP and fragmentation with 
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20 



25 



30 



35 



The 



uracil H-gly=osylase has th. added of eliminating 

carrv over between samples. 

Mutations in the CFTR gene can be detect by any of th. 
tiling strategies noted above. The block tiling strategy is 
s one particularly useful approach. In this strategy, a group 
,2 bloc*) of probes is used to analyze a short segment of 
contiguou nucleotides (e.g., 3. S . 7 or fro, a CFTE gene 
centered around the site of a station. The probes in a group 
^sometimes referred to as constituting a bloc, because all 
„ probes in the group are usually identical except at 

interrogation positions. fcs noted above, the probes -ay also 
differ in the presence of leading or trailing seguenc.s 
flanging regions of complementary. However, for ease of 
iltstration, it » £ U assumed that such seguences are not 
„«„, as an example, to analyze a segment of five 

15 ::::: :; us ^ «*, ™ — 7 : 

of a mutation (such as one of the stations in Table 6) , a 
block of probes usually contains at least one wildtype probe 
and five sets of mutant probes, each having three probes " 
Tildt^e probe has five interrogation positions corresponding 
to the five nucleotides being analyzed from the reference 
sequence. However, the identity of the interrogation 
"ons is only apparent when the structure of the wildtype 
IHL is compared with that of the probes in the f xve mutant 
probe sets. The first mutant probe set comprises three 
probes, each being identical to the wildtype probe, except 
Se first interrogation position, which differs xn each of the 
three mutant probes and the wildtype probe. The second 
trough fifth mutant probe sets are similarly composed except 
that the differences from the wildtype probe occur xn the 
second through fifth interrogation position respectxvely . 
Note that in practice, each set of mutant probes xs sometxmes 
laid down on the chip ^xtaposed with an associated wildtype 
probe. in this situation, a block would comprxse fxve 
wildtype probes, each effectively providing the same 
information. However, visual inspection and C °"™ e 
analysis of the chip is facilitated by the largely redundant 
information provided by five wildtype probes. 
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After hybridization to labelled target, the relative 
hybridization signals are read from the probes. Comparison of 
the intensities of the three probes in the first mutant probe 
set with that of the wildtype probe indicates the identity of 
the nucleotide in the target sequence corresponding to the 
first interrogation position. Comparison of the intensities 
of the three probes in the second mutant probe set with that 
of the wildtype probe indicates the identity of the nucleotide 
in the target sequence corresponding to the second 
interrogation position, and so forth. Collectively, the 
relative hybridization intensities indicate the identity of 
each of the five contiguous nucleotides in the reference 
sequence. 

In a preferred embodiment, a first group (or block) of 
probes is tiled based on a wildtype reference sequence and a 
second group is tiled based a mutant version of the wildtype 
reference ^ sequence. The mutation can be a point mutation, 
insertion or deletion or any combination of these. The 
combination of first and second groups of probes facilitates 
analysis when multiple target sequences are simultaneously 
applied to the chip, as is the case when a patient being 
diagnosed is heterozygous for the CFTR allele. 

The above strategy is illustrated in Fig. 16, which shows 
two groups of probes tiled for a wildtype reference sequence 
and a point mutation thereof. The five mutant probe sets for 
the wildtype reference sequence are designated wtl-5, and the 
five mutant probe sets for the mutant reference sequence are 
designated ml-5. The letter N indicates the interrogation 
position, which shifts by one position in successive probe 
sets from the same group. The figure illustrates the 
hybridization pattern obtained when the chip is hybridized 
with a homozygous wildtype target sequence comprising 
nucleotides n-2 to n+2, where n is the site of a mutation. 
For the group of probes tiled based on the reference sequence, 
four probes are compared at each interrogation position. At 
each position, one of the four probes exhibits a perfect match 
with the target, and the other three exhibit a single-base 
mismatch. For the group of probes tiled based on the mutant 
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reference sequence, again four probes are compared at each 
interrogation position. At position, n, one probe exhibits a 
perfect match, and three probes exhibit a single base 
mismatch. Hybridization to a homozygous mutant yields an 
5 analogous pattern, except that the respective hybridization 

patterns of probes tiled on the wildtype and mutant reference 
sequences are reversed. 

The hybridization pattern is very different when the chip 
is hybridized with a sample from a patient who is heterozygous 
10 for the mutant allele (see Fig. 17) . For the group of probes 
tiled based on the wildtype sequence, at all positions but n, 
one probe exhibits a perfect match at each interrogation 
position, and the other three probes exhibit a one base 
mismatch. At position n, two probes exhibit a perfect match 
15 (one for each allele) , and the other probes exhibit single- 
base mismatches. For the group of probes tiled on the mutant 
sequence, the same result is obtained. Thus, the heterozygote 
point mutant is easily distinguished from both the homozygous 
wildtype and mutant forms by the identity of hybridization 
2 0 patterns from the two groups of probes. 

Typically, a chip comprises several paired groups of 
probes, each pair for detecting a particular mutation. For 
example, some chips contain 5, 10, 20, 4 0 or 100 paired groups 
of probes for detecting the corresponding numbers of 
25 mutations. Some chips are customized to include paired groups 
of probes for detecting all mutations common in particular 
populations (see Table 6) . Chips usually also contain control 
probes for verifying that correct amplification has occurred 
and that the target is properly labelled. 
30 The goal of the tiling strategy described above is to 

focus on short regions of the CTFR region flanking the sites 
of known mutation. Other tiling strategies analyze much 
larger regions of the CFTR gene, and are appropriate for 
locating and identifying hitherto uncharacterized mutations. 
35 For example, the entire genomic CFTR gene (250 kb) can be 

tiled by the basic tiling strategy from an array of about one 
million probes. Synthesis and scanning of such an array of 
probes is entirely feasible. other tiling strategies, such as 
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the block tiling, multiplex tiling or pooling can cover the 
entire gene with fewer probes. "Some tiling strategies analyze 
some or all of components of the CFTR gene, such as the cDNA 
coding sequence or individual exons. Analysis of exons 10 and 
11 is particularly informative because these are location of 
many common mutations including the AF508 mutation. 
Exemplary CFTR chins 

One illustrative chip bears an array of 1296 probes 
covering the full length of exon 10 of the CFTR gene arranged 
in a 36 x 36 array of 356 M m elements. The probes in the 
array can have any length, preferably in the range of from 10 
to 18 residues and can be used to detect and sequence any 
single-base substitution and any deletion within the 192-base 
exon, including the three-base deletion known as AF508. As 
described in detail below, hybridization of nanomolar 
concentrations of wild-type and AF508 oligonucleotide target 
nucleic acids labeled with fluorescein to these arrays 
produces highly specific signals (detected with confocal 
scanning fluorescence microscopy) that permit discrimination 
between mutant and wild-type target sequences in both 
homozygous and heterozygous cases. 

sets of probes of a selected length in the range of from 
10 to 18 bases and complementary to subsequences of the known 
wild-type CFTR sequence are synthesized starting at a position 
a few bases into the intron on the 5 '-side of exon 10 and 
ending a few bases into the intron on the 3 '-side. There is a 
probe for each possible subsequence of the given segment of 
the gene, and the probes are organized into a "lane" in such a 
way that traversing the lane from the upper left-hand corner 
of the chip to the lower righthand corner corresponded to 
traversing the gene segment base-by-base from the 5 '-end. The 
lane containing that set of probes is, as noted above, called 
the "wild-type lane." 

Relative to the wild-type lane, a "substitution" lane, 
called the "A-lane", was synthesized on the chip. The A-lane 
probes were identical in sequence to an adjacent (immediately 
below the corresponding) wild-type probe but contained, 
regardless of the sequence of the wild-type probe, a dA 
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residue at position 7 (counting from the 3 '-end). In similar 
fashion, substitution lanes with replacement bases dC, dG, and 
dT were placed onto the chip in a »C-lane," a "G-lane," and a 
•■T-lane," respectively. A sixth lane on the chip consisted of 
probes identical to those in the wild-type lane but for the 
deletion of the base in position 7 and restoration of the 
original probe length by addition to the 5 '-end the base 
complementary to the gene at that position. 

The four substitution lanes enable one to deduce the 
sequence of a target exon 10 nucleic acid from the relative 
intensities with which the target hybridizes to the probes in 
the various lanes. Various versions of such exon 10 DNA chips 
were made as described above with probes 15 bases long, as 
well as chips with probes 10, 14, and 18 bases long. For the 
results described below, the probes were 15 bases long, and 
the position of substitution was 7 from Lhe 3' -end. 



The sequences of several important probes are shown 
below. in each case, the letter "X" stands for the 
interrogation position in a given column set, so each of the 
20 sequences actually represents four probes, with A, C, G, and 

T, respectively, taking the place of the "X." Sets of shorter 
probes derived from the sets shown below by removing up to 
five bases from the 5 '-end of each probe and sets of longer 
probes made from this set by adding up to three bases from the 
25 exon 10 sequence to the 5 • -end of each probe, are also useful 
and provided by the invention. 
3 i -TTTATAXTAGAAACC 
3 ' - TTATAGXAGAAACCA 
3 ' - TATAGTXGAAACCAC 
3 0 3 ' - ATAGTAXAAACCACA 
3 " - T AGTAG XAACC AC AA 

3 « - AGTAGAXACCACAAA 
GTAGAAXCCACAAAG 
TAGAAAXCACAAAGG 
AGAAACXACAAAGGA 



3 '- 
3 1 - 

35 3 '- 



To demonstrate the ability of the chip to distinguish the 
AF508 mutation from the wild-type,, two synthetic target 
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nucleic acids were made. The first, a 39-mer complementary to 
a subsequence of exon 10 of the CFTR gene having the three 
bases involved in the AF5 08 mutation near its center is 
called the "wild-type- or wt508 target, corresponds to 
5 positions 111-149 of the exon, and has the sequence shown 
below: 

5 • -CATTAAAGAAAATATCATCTTTGGTGTTTCCTATGATGA. 

The second, a 36-mer probe derived from the wild-type target 
by removing those same three bases, is called the "mutant" 
10 target or mu508 target and has the sequence shown below, f irst 
w lt h dashes to indicate the deleted bases, and then without 
dashes but with one base underlined (to indicate the base 
detected by the T-lane probe, as discussed below) : 

5 ' -CATTAAAGAAAATATCAT TGGTGTTTCCTATGATGA ; 

15 5 ' -CATTAAAGAAAATATCATTGGTGTTTCCTATGATGA . 

Both targets were labeled with fluorescein at the 5 '-end. 

in three separate experiments, the wild-type target the 
mutant target, and an eguimolar mixture of both targets was 
exposed (0.1 nM wt508, 0.1 nM rou50 8, and 0.1 nM wt508 plus 0 1 
20 nw mu508, respectively, in a solution compatible with nucleic 
add hybridization) to a CF chip. The hybridization mixture 
was mcubated overnight at room temperature, and then the chip 
was scanned on a reader (a confocal fluorescence microscope in 
photon-counting mode) ; images of the chip were constructed 
from the photon counts) at several successively higher 
temperatures while still in contact with the target solution 
After each temperature change, the chip was allowed to 
equilibrate for approximately one-half hour before being 
scanned. After each set of scans, the chip was exposed to 
denaturing solvent and conditions to wash, i.e., remove target 
that had bound, the chip so that the next experiment could be 
done with a clean chip. 

The results of the experiments are shown in Figures 18 
19, 20, and 21. Figure 18, in panels A, B , and C, shows an' 
image made from the region of a DNA chip containing CFTR exon 
10 probes; in panel A, the chip was hybridized to a wild-type 
target; in panel c, the chip was hybridized to a mutant AF508 
target; and in panel B, the chip was hybridized to a mixture 
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of the wild-type and mutant targets. Figure 19, in sheets 1 - 
3, corresponding to panels A, B, and C of Figure 3, shows 
graphs of fluorescence intensity versus tiling position. The 
labels on the horizontal axis show the bases in the wild-type 
5 sequence corresponding to the position of substitution in the 
respective probes. Plotted are the intensities observed from 
the features (or synthesis sites) containing wild-type probes, 
the features containing the substitution probes that bound the 
most target ("called"), and the feature containing the 
10 substitution probes that bound the target with the second 
highest intensity of all the substitution probes ("2nd 
Highest") . 

These figures show that, for the wild-type target and the 
equimolar mixture of targets, the substitution probe with a 
15 nucleotide sequence identical to the corresponding wild-type 

probe bound the most target, allow iny £ul an unambiguous 

assignment of target sequence as shown by letters near the 
points on the curve. The target wt508 thus hybridized to the 
probes in the wild-type lane of the chip, although the 
20 strength of the hybridization varied from probe-to-probe, 
probably due to differences in melting temperature. The 
sequence of most of the target can thus be read directly from 
the chip, by inference from the pattern of hybridization in 
the lanes of substitution probes (if the target hybridizes 
25 most intensely to the probe in the A-lane, then one infers 

that the target has a T in the position of substitution, and 
so on) . 

For the mutant target, the sequence could similarly be 
called on the 3'-side of the deletion. However, the intensity 
3 0 of binding declined precipitously, as the point of subs_titJXtJ-an . 
approached the site of the deletion from the 3 1 -end of the 
target, so that the binding intensity on the wild-type probe 
whose point of substitution corresponds to the T at the 3* -end 
of the deletion was very close to background. Following that 
3 5 pattern, the wild-type probe whose point of substitution 
corresponds to the middle base (also a T) of the deletion 
bound still less target. However, the probe in the T-lane of 
that column set bound the target very well. Examination of 
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the sequences of the two targets reveals that the deletion 
places an A at that position when the sequences are aligned at 
their 3 '-ends and that the T-lane probe is complementary to 
the mutant target with but two mismatches near an end (shown 
5 below in lower-case letters, with the position of substitution 
underlined) : 

Target : 5 « -CATTAAAGAAAATATCATTGGTGTTTCCTATGATGA 

Pr ° be 1 3 ■ -TagTAGTAACCACAA 

Thus the T-lane probe in that column set calls the correct 
.0 base from the' mutant sequence. Note that, in the graph for 
the equimolar mixture of the two targets, that T-lane probe 
binds almost as much target as does the A-lane probe in the 
same column set, whereas in the other column sets, the probes 
that do not have wild-type sequence do not bind target at all 
5 as well. Thus, that one column set, and in particular the 

T-lane probe within that set, detects the AF508 mutation under 
conditions that simulate the homozygous case and also 
conditions that simulate the heterozygous case. 

Although in this example the sequence could not be 
0 reliably deduced near the ends of the target, where there is 
not enough overlap between target and probe to allow effective 
hybridization, and around the center of the target, where 
hybridization was weak for some other reason, perhaps high 
AT-content, the results show the method and the probes of the 
5 invention can be used to detect the mutation of interest. The 
mutant target gave a pattern of hybridization that was very 
similar to that of the wt508 target at the ends, where the two 
share a common sequence, and very different in the middle, 
where the deletion is located. As one scans the image from 
right to left, the intensity of hybridization of the target to 
the probes in the wild-type lane drops off much more rapidly 
near the center of the image for mu508 than for wt508; in 
addition, there is one probe in the T-lane that hybridizes 
intensely with mu508 and hardly at all with wt508. The 
results from the equimolar mixture of the two targets, which 
represents the case one would encounter in testing a 
heterozygous individual for the mutation, are a blend of the 
results for the separate targets, showing the power of the 
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invention to distinguish a wild-type target sequence from one 
containing the AF508 mutation and to detect a mixture of the 
two sequences. 

The results above clearly demonstrate how the DNA chips 
5 of the invention can be used to detect a deletion mutation, 
AF508; another model system was used to show that the chips 
can also be used to detect a point mutation as well. One 
mutation in the CFTR gene is G480C, which involves the 
replacement of the G in position 4 6 of exon 10 by a T, 

10 resulting in the substitution of a cysteine for the glycine 
normally in position #480 of the CFTR protein. The model 
target sequences included the 21-mer probe wt4 8 0 to represent 
the wild-type sequence at positions 37-55 of exon 10: 
5 » -cCTTCAGAGGGTAAAATTAAG and the 21-mer probe mu480 to 

15 represent the mutant sequence: 
5 1 — CCTTCAGAGTGTAAAATTAAG . 

In separate experiments, a DNA chip was hybridized to 
each of the targets wt4 8 0 and mu4 80, respectively, and then 
scanned with a confocal microscope. Figure 20, in panels A, 

20 B, and C, shows an image made from the region of a DNA chip 
containing CFTR exon 10 probes; in panel A, the chip was 
hybridized to the wt480 target; in panel C, the chip was 
hybridized to the mu480 target; and in panel B, the chip was 
hybridized to a mixture of the wild-type and mutant targets. 

25 Figure 21, in sheets 1-3, corresponding to panels A, B, and 
C of Figure 20, shows graphs of fluorescence intensity versus 
tiling position- The labels on the horizontal axis show the 
bases in the wild-type sequence corresponding to the position 
of substitution in the respective probes- Plotted are the 

30 intensities observed from the features (or synthesis sites) 
containing wild-type probes, the features containing the 
substitution probes that bound the most target ("called"), and 
the feature containing the substitution probes that bound the 
target with the second highest intensity of all the 

35 substitution probes ("2nd Highest"). 

These figures show that the chip could be used to 
sequence a 16-base stretch from the center of the target wt48 0 
and that discrimination against mismatches is quite good 
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throughout the sequenced region W h 
expose to the target „ u480 ^ 2T B T' CMP 
the chip shown bound the target well Z ' " P ° r "°" « 

probes devoted to identifying th. I *" Set « 

= 10 and that has an A in H e po^f " P0Siti0n 46 in «« 

fully complementary to the centra, " °f. SUb " itu "<»< "«d so is 
t»r,et. All other probes in t ha t P ° rtl ° n ° f °» ~*«t 

■» for several positions to both sides of th! V*"™" °* ' U '' 80 
reed from the chip alb.lt „-.k "Utation can be 

those observed wit" T^ZlTZT^ <™ 

The results also show that ™u 
»ixed together ana exposed ^1'-^ «~ 
. Pattern observed was a combination ot t h t \ J ^ ti0 " 
The wild-type seguenc. could easUv t patt *™s. 
«■» probe that bound the ^J^f"" ° hiP ' 

«480 target was present also bound" LT j ""^ 
-tant and wild-type targets were present IT- **" 
the hybridization pattern easily /, 1 ,mure . »a«ng 

-e wild-type target .lone"^.^^" T °* 
Power of the DNA chips of the inv.r,^ Sh °" the 

mutations in both hoi- a„d h !" " d "* ot 

T » -^strate^in^^ir:: err-" 5 - 

the invention, the chips were use t T ™ A ChipS ° f 

■nutations i„ nucleic a^idlT dy detect 

-pies fro. , indLidu^rcar^ih 9 :":"^ ™- 

« individua! hetero3ygous f\™ ^rr^ "™ « 

""ng exon !0 primers containinc th! a »Pl" 1 ed by PCE 

Polymerase. Ulustrative ^mers " 

below. tne inventaon are shown 

Exon Name Sequence 

C«9-T7 TAATACGACTCACTATAGGGAGatgacctaataat ,. 
CPH0C-T7 TAATACGACTCACTATAGGGAGt, ! ! 9 g99ttt 

CFilO-T7 TAATACGACTCA^ggac 9 9aa999ttCatit9 ° 

Crillo-n TAATACGACTCACTA^gga "^"actaaaagtg.ctctc 
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These priors can be used to amplify exon 10 or exon l 
sequences; in another embodiment, multiplex PCR is employed, 
using two or .ore pairs of primers to amplify -re than one 

exon at a time. . ^ . a ^ 

5 The product of amplification was then used as a template 

for the RNA polymerase, with f luoresceinated UTP present to 
label the RNA product. After sufficient RNA was made, 
fragmented and applied to an exon 10 DNA chip for 15 minutes 
aftlr which the chip was washed with hybridization buffer and 
10 scanned with the fluorescence microscope. A useful positive 
control included on many CF exon 10 chips is the 8-mer 
3 1 -CGCCGCCG-5 1 . Figure 22, in panels A and B, shows an image 
ma de from a region of a DNA chip containing CFTR exon 10 
probes; in panel A, the chip was hybridized to nucleic acid 
15 derived from the genomic DNA of an individual with wild-type 

*~ ^ WQ i — r — the target nucleic acxd 

AF508 sequences; in panel B, 

originated from a heterozygous (with respect to the AF508 
mutation) individual. Figure 23, in sheets 1 and 2 , 
corresponding to panels A and B of Figure 22, shows graphs of 
20 fluorescence intensity versus tiling position. 

These figures show that the sequence of the wild-type RNA 
can be called for most of the bases near the mutation. In the 
case of the AF508 heterozygous carrier, one particular probe, 
the same one that distinguished so clearly between the 
25 wild-type and mutant oligonucleotide targets in the model 

system described above, in the T-lane binds a large amount of 
RNA, while the same probe binds little RNA from the wild-type 
individual. These results show that the DNA chips of the 
invention are capable of detecting the AF508 mutation m a 

3 0 heterozygous carrier. 4. 

Further chips were constructed using the block tiling 
strategy to provide an array of probes for analyzing a CFTR 
station. The array comprised 93 mm x 96 „ features arranged 
into eleven columns and four rows (44 total probes) Probes 

35 in five of these columns were from four probe sets tiled based 
on the wildtype CFTR sequence- and having interrogation 
positions corresponding to the site of a mutation and two 
bases on either side. Five of the remaining columns contained 
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four sets of probes tiled based on the mutant version of the 
CFTR sequence. These probe sets also had interrogation 
positions corresponding to the site of mutation and two 
nucleotides on either side. The eleventh column contained 
four cells for control probes. 

Fluorescently labeled hybridization targets were prepared 
by PGR amplification. 100 M g of genomic DNA, 0.4 m of each 
primer, 50 M M each dATP, dCTP, dCTP and dUTP (Pharmacia) n 
10mM Tris-Cl, P H 8.3, 50 mM KC1, 2 . 5 mM MgCl 2 and 2 U Tag 
polymerase (Perkin-Elmer) were cycled 36 times using a Perkin- 
Elmer 9600 thermocycler and the following times and 
temperatures: 95»C, 10 sec, 55°c, 10 sec, 72-c, 30 sec. 10 
Ml of this reaction product was used as a template in a 
second, asymmetric PCR reaction. Conditions included l^M 
15 asymmetric PCR primer, 50 M M each dATP, dCTP, TTP, 25 fiM 

fluorescein-dGTP (DuPont) , 10 mM Tris.Cl, pH 9.1, '75 mM KC1 
3.5 mM MgCl 2 . The reaction was cycled 5X with the following 
conditions: 95- C , 10 sec, 60-c, 10 sec, 55-c, 1 min. and 72»C 
1.5 Mm. This was immediately followed with another 20 cycles 
20 usmg.the following conditions: 95°C, 10 sec, 60*C 10 sec 
72°C, 1.5 min. 

Amplification products were fragmented by treating 
with 2 U of Uracil-N-glycosylase (Gibco) at 30-c for 30 min 
followed by heat denaturation at 95-0 for 5 min. Finally the 
labeled, fragmented PCR product was diluted into hybridization 
buffer made up of 5 x SSPE and 1 mM Cetyltrimethylammonium 
Bromide (CTAB) . The dilution factor ranged from lOx to 25x 
with 40 Ml of sample being diluted into 0.4 ml to l ml of 
hybridization solution. 

Target hybridization was generally carried out with 
the chip shaking in a small dish containing 500 M l to l ml 
total volume of hybridization solution. All hybridizations 
were done at 30-C constant temperature. Alternatively some 
hybridizations were carried out with chips enclosed in a 
Plastic package with the 1 cm x 1 cm chip glued facing a 250 
Ml fluid chamber. 250-350 „i of hybridization solution was 
introduced and mixed using a syringe pump. Temperature was 
controlled by interfacing the back surface of the package with 
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a Peltier heating/cooling device. Following hybridization 
chips were washed with 5X SSPE, 0.1% Triton X-100 at 25°C-30°C 
prior to fluorescent image generation. 

Hybridized, washed DNA chips were scanned for 
5 fluorescence using a stage-scanning confocal epif luorescent 
microscope and 488nm argon ion laser excitation. Emitted 
light was collected through a band pass filter centered at 
530nM. The resulting fluorescence image was spatially 
reconstructed and intensity data were then analyzed. Features 
10 with the peak fluorescence intensity in each column were 
identified and compared with any signal intensity at the 
remaining single base mismatch probe sites in the same column. 
The sequences of the highest intensity features were then 
compared across all ten columns of each sub-array to determine 
15 whether peak intensity scores for the wild type sequence and 
the mutant sequence were similar or sigmr icantly different. 
These results were used to generate the genotype call of wild 
type (high intensity signals only in wild type probe columns), 
mutant (high intensity signals only in the mutant probe 
20 columns) or heterozygous (high intensity signals in both the 
wild type and mutant probe columns) . 

Figure 24 (panel A) shows an image of the fluorescence 
signals in arrays designed to detect the G551D(G>A) and 
Q552X(C>T) CFTR mutations. The hybridization target is an 
25 exon 11 amplicon generated from wild type genomic DNA. Wild 

type hybridization patterns are evident at both locations. No 
significant fluorescence signal resulted at any of the 
features with probes complementary to mutant or mismatched 
sequences. Relative fluorescence intensities were six fold 
3 0 brighter for the perfect matched wildtype features compared 
with the background signal intensity at mutant and mismatch 
features. In addition, the sequence at these loci can be 
confirmed as AGGTC and GTCAA , respectively, where the bold 
type face indicates the mutation sites. Figure 24 (panel B) 
35 shows the same probe array features after hybridization with a 
fluorescent target generated from DNA heterozygous for the 
G551D mutation. Both the wild type and mutant probe columns 
have features with significant fluorescence intensity, 
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indicating the hybridization of both wild type and mutant CFTR 
alleles at this site. Only wildtype probes hybridized with 
any significant fluorescence signal in the Q552X subarray 
indicating a wild type target sequence. However, an 
5 additional feature that did not hybridize in the first 

experiment shows significant fluorescence intensity in this 
experiment. Because the G551D and Q552X mutations are only 
two bases apart, the a probe sequence in the additional 

10 G "T: a "e S t a matCh£d ° VSrlaP With «» ~ 

Figure 25 (panels A and B) illustrates mutation 

AF5 ° 8 ' 3 b3Se Pair dSletion in E *°" 10 of 

the CFTR gene. In contrast to the hybridization pattern seen 
in base change mutations, in mutations where bases are 

15 inserted or deleted, probe arrays show a different 

hybridization pattern. Identical probes are synthesized in 
the two central columns of base substitution arrays As a 
result, either mutant or wild type target hybridizations 
always result in two side-by-side features (a doublet) with 

-0 high fluorescence intensity at the center of the array m a 

L et ti oz wir d te t hybridization ' tw ° sets ° f d ° ubiets < — 

to the wild type sequence and one to the mutant sequence occur 
(Figure 24, panel B) . in contrast, wild type and mutant probl 
column sequences are offset from each other for deletion or 
5 insertion mutations and hybridization doublets are not seen 
instead of the six high intensity signals with one doublet ' 
five independent features in alternating columns characterize 
a homozygote and ten features, one in each column will be 
P««tive with heterozygote targets. This is evident from the 
wl 111 hyb f di2ati - P*«~» in figure 25, panel A. Although a 
wildtype target has been hybridized and the highest intensity 
features confirm the wild type sequence (ATCTT) , there is an 
additional hybridization in the first mutant column. Analysis 
of that probe sequence shows a io base perfect match with the 
» mutant sequence. 

The image in Figure 25, panel B resulted from 
hybridizing a DNA chip with a target homozygous for AF508. m 
this image five features, all with probe sequences 
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complementary to the mutant show significant signal. The 
mutation sequence bridging the deletion site, ATTGG , is 
confirmed. Similar to what was seen in the example of the 
G551D mutation, there is added information in neighboring 
5 subarrays designed to detect the aI507 and F508C mutations. 
This is expected since they are in such close proximity to 
AF508 that their probe sets significantly overlap the AF508 
probes. The AF508 homozygous target has no perfect matches 
with wild type or mutant probes in the aI507 and F508C 
10 subarrays. However, there are some low intensity signals 
within these two blocks of probes. The F508C array has a 
doublet that matches 11 bases of the mutant AF508 target. 
Similarly, the hybridization in the eighth column of the aI507 
array has a probe that matches 13/14 bases with the target. 
15 Figure 26 shows hybridization of a heterozygous double 

mutant AF508/F508C to the same array as described abuve. 
Conventional reverse dot blot would score this sample as a 
homozygous AF508 mutant. In the present assays, the AF508 and 
F508C alleles are separately detected by the respective 
2 0 subarrays designed to detect these mutations. 

c. Chips for Cancer Di agnosis 

There are at least two types of genes which are often 
altered in cancerous cells. The first type of gene is an 

25 oncogene such as a mismatch-repair gene, and the second type 
of gene is a tumor suppressor gene such as a transcription 
factor. Examples of mismatch repair oncogenes genes include 
hMSH2 (Fishel et al., Cell 75, 1027-1038 (1993)) and hMLHl 
(Papadopoulos et al., Science 263, 1625-1628 (1994)). The 

30 most well-known example of a tumor suppressor gene is the p53 
protein gene (Buchman et al., Gene 70, 245-252 (1988). By 
monitoring the state of both oncogenes and tumor suppressor 
genes (individually and in combination) in a patient, it is 
possible to determine individual susceptibility to a cancer, a 

35 patient's prognosis upon cancer diagnosis, and to target 
therapy more efficiently. 

The p53 gene spans 2 0 kbp in humans and has 11 exons? 10 
of which are protein coding, (see Tominaga et al., 1992, 
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Critical Kevins in Onco genes i s 3:257-282, incorporated herein 
by reference). The gene produces a 53 kilodalton 
Phosphoprotein that regulates DNA replication. The protein 
acts to halt replication at the G1/S boundary in the cell 
5 cycle and is believed to act as a "molecular policeman," 

shuttxng down replication when the DNA is damaged or blocking 
the reproduction of DNA viruses (see Lane, 1992, Mature 
358:15-16, incorporated herein by reference). The p53 
transcription factor is part of a fundamental pathway which 
10 controls cell growth. wild-type p53 can halt cell growth or 

SuchT CaSeS br±ng ab ° Ut Pr ° granUned ~" *«th (apoptosis) . 
Such tumor-suppressive effects are absent in a variety of 
known P 53 gene mutations. Moreover, P 53 mutants not only 
deprive a cell of wild-type p53 tumor suppression, they also 
15 may spur abnormal cell growth. 

in tumor cells, P 53 is the most commonly mutated gene 
discovered to date (see Levine et al., 1991 , Nature 
351:453-456, and Hollstein et al., 1991/ Science 253:49 . 53 

20 ;f C the 0f 6 W 5 iCh ^ S inC °™ ^ Terence, Over half 

20 of the 6.5 million patients diagnosed with cancer annually 
Possess P 53 mutations in their tumor cells. Among common 

IZTo* TV ?0% ° f COl ° reCtal C — ' of lung cancers 

and 40% of breast cancers contain P 53 mutations. m all, over 
51 types of human tumors have been documented to possess P 53 
25 mutations, including bladder, brain, breast, cervix, colon 
esophagus, larynx, liver, lung, ovary, pancreas, prostate/ 
skin stomach, and thyroid tumors (Culotta & Koshland, Science 
262, 1958-1961 (1993); Rodrigues et al., 1990 , PNAS 
87:7555-7559, incorporated herein by reference). According to 
data presented by David Sidransky (1992 San Diego Conference) 
over 400 mutations in p53 are known. The presence of a p53 
mutation in a tumor has also been correlated with a patient's 
prognosis. Patients who possess p53 mutations have a lower 5- 
year survival rate. 3 

Proper diagnosis of the form of p 53 in tumor cells is 
critical to clinicians to prescribe appropriate therapeutic 
regimens For instance, patients with- breast cancer who show 
no invasion of nearby lymph nodes generally do not relapse 
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after standard surgical treatment and chemotherapy. Of the 
25% who do relapse after surgery and chemotherapy, additional 
chemotherapy is appropriate. At present, there is no clear 
way to determine which patients will benefit from such 
5 additional chemotherapy prior to relapse. However, 

correlating p53 mutations to tumorigenicity and metastasis 
provides clinicians with a means to determine whether such 
additional treatments are warranted. 

In addition to facilitating conventional chemotherapy, 

10 appropriate diagnosis of P 53 mutations provides clinicians 

with the ability to identify individuals who will benefit the 
most from gene therapy techniques, in which appropriately 
operative p53 copies are restored to a tumor site. Clinical 
p53 gene therapy trials are presently underway (Culotta & 

15 Koshland, supra) . 

The analysis of p53 mutations can also be used to 

identify which carcinogens lead to particular tumors (Harris, 

aflatoxin B 2 exposure is associated with G:C to T : A 
20 transversions at residue 249 of p53 in hepatocellular 

carcinomas (Hsu et al., Nature 350, 427 (1991); Bressac et 

al., Nature 350, 429 (1991); Harris, supra). 

While most described p53 mutations are somatic in origin, 

some types of cancer are associated with germline p53 
25 mutation. For instance, Li-Fraumeni syndrome is a hereditary 

condition in which individuals receive mutant p53 alleles, 

resulting in the early onset of various cancers (Harris, 

supra); Frebourg et al., PNAS 89, 6413-6417 (1992); Malkin et 

al., Science 250, 1233 (1990)). These mutations are 
3 0 associated with instability in the rest of the genome, 

creating multiple genetic alterations, and eventually leading 

to cancer. 

hMLHl and hMSH2 are mismatch repair genes which are 
causal agents in hereditary nonpolyposis colorectal cancer in 
35 individuals with mutant hMLHl or hMSH2 alleles (Fishel et al., 
supra, and Papadopoulos et al., supra). Hereditary 
nonpolyposis colorectal cancer is a common genetic disorders, 
affecting about 1 in 200 individuals (Lynch et al.. 



15 



35 



WO 95/1 1995 PCT/US94/12305 

95 

Gastroenterology 104, 1535 (1993)). Detection of hMLHl and 
hMSH2 mutations in the population allows diagnosis of 
nonpolyposis colorectal cancer prone individuals prior to the 
manifestation of disease. This allows for the implementation 
5 of special screening programs for cancer-prone individuals to 
ensure early detection of cancer, thereby enhancing survival 
rates of afflicted individuals. m addition, genetic 
counselors may use the information derived from HMLHl and 
HMSH2 chips to improve family planning as described for cystic 
10 fibrosis chips. The detection of mutations in hMLHl and hMSH2 
individually or in combination with P 53 can also be used by 
clinicians to assess cancer prognosis and treatment modality. 
Finally, the information can be used to target appropriate 
individuals for gene therapy. 

The entire hMLHl gene is less than 85 kbp in length, 
comprising 2268 coding nucleotides (Papadopoulos et al.,' 
supra) . Sequences from the gene have been deposited with 
GenBank (accession number U07418) . Mutations associated with 
hereditary nonpolyposis colorectal cancer include the deletion 
20 of exon 5 (codons 578-632), a 4 base pair deletion of codons 
727 and 728 resulting in a shift in the reading frame of the 
gene, a 4 base pair insertion at codons 755 and 756 resulting 
in an extension of the COOH terminus, a 371 base pair deletion 
and frameshift mutation at position 347, and a transversion 
causing an alteration of codon 252 resulting in the insertion 
of a stop codon (id.). 

hMSH2 is a human homologue of the bacterial Muts and s. 
cerevisiae MSH mismatch-repair genes. MSH2 , like hMLHl is 
associated with hereditary nonpolyposis cancer. Although only 
a few MSH2 gene samples from tumor tissue have been 
characterized, at least some tumor samples show a T to C 
transition mutation at position 2020 of the cDNA sequence, 
resulting in the loss of an intron-exon splice acceptor site. 

In view of the role of mutations in p53 , MSH2 and/or 
hMLHl in hereditary predisposition to cancer, to neoplastic 
transformation events leading to cancer and to cancer 
prognosis, it is important to screen individuals ■ to determine 
whether they possess mutant alleles, and to identify precisely 
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which mutations the individuals possess. Because many 
mutations are point mutations, or extremely small insertions 
or deletions, which are generally undetectable by standard 
Southern analysis, accurate diagnosis requires a capacity to 
5 examine a gene nucleotide-by-nucleotide . 

Mutations in the hMSH2 , hMLHl or p53 genes, irrespective 
of whether previously characterized, can be detected by any of 
the tiling strategies noted above. Reference sequences of 
interest include full-length genomic and cDNA sequences of 

10 each of these genes and subsequences thereof, such as exons 
and introns. For example, each nucleotide in the 20 kb p53 
genomic sequence can be tiled using the basic strategy with an 
array of about 80,000 probes. As in the CFTR chip, some 
reference sequences are comparatively short sequences 

15 including the site of a known mutation and a few flanking 

nucleotides. some chips tile reference sequences that 

encompass mutational "hot spots." For instance, a variety of 
cellular and oncoviral proteins bind to specific regions of 
p53, including Mdm2, SV4 0 T antigen, Elb from adenovirus and 

2 0 E6 from human papilloma virus. These binding sites correlate 
to some extent with observed high frequency somatic mutation 
regions of p53 found in tumor cells from cancer patients (see 
Harris et al., supra). Hot spots include exons 2, 3, 5, 6, 7 
and 8 and the iritrorixc regions between exons "2 and 3, 3 and 4 

2 5 and 4 and 5. Fragments of the hMLHl gene of particular 

interest include those encoding codons 578-632, 727, 728, 347, 
252. Some chips are tiled to read mutations in each of the 
hMSH2, hMLHl and p53 genes, both wildtype and mutant versions. 
Standard or asymmetric PCR can be used to generate the 
30 target DNA used in the tiling assays described above. In 

general, PCR is used to amplify hMSH2 , hMLHl or p53 sequences 
from a tissue of interest such as a tumor. Mixed PCR 
reactions can also be used to generate hMSH2 , hMLHl or p53 
sequences simultaneously in a single reaction mixture. Any of 

3 5 the coding or noncoding sequences from the genes may be 

amplified for use in the block tiling assays described above. 

Table 8 below provides examples of primers which . are 
useful in synthesizing specific regions of hMSH2 , hMHLHl and 
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P53." Other primers can readily be devised from the known 
genomic and cDNA sequences of the genes. The primers 
described in Table 8 specific for p53 amplification have ends 
tailored to facilitate cloning into standard restriction 
enzyme cloning sites. 

hMSH2 EX3mpleS ° f PCR PrimefS US6fui in am P Iif y^9 regions of p53, hMHH1 and 



Region 
10 Amplified 

Exon 5 
<p53) 

Exon 5 
(p53) 

15 Exon 6 
(p53) 

Exon 6 
<p53) 

Exon 7 
20 ( P 53) 

Exon 7 
(p53) 

Exon 8 
<p53} 
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Exon 8 
(P53) 

hMSH2 



hMSH2 
hMLH1 



Primer Sequence 

TAA TAC GAC TCA CTA TAG GGA GA CCC 
TGG GCA ACC AGC CCT GTC GT 

ATG CAA TTA ACC CTC ACT AAA GGG 
AGA CAC TTG TGC CCT GAC TTT CAA C 

TAA TAC GAC TCA CTA TAG GGA GCC 
TCC TCC CAG AGA CCC 

ATG CAA TTA ACC CTC ACT AA GGG AGA 
TCC CCA GGC CTC TGA TTC CTC ACT G 

TAA TAC GAC TCA CTA TAG GGA CTG 
GGG CAC AGC CAG GCC AGT GTG CA 

ATG CAA TTA ACC CTC ACT AAA GGG 
AGA GTC TCC CCA AGG CGC ACT GGC 
CTC A 

TAA TAC GAC TCA CTA TAG GGA GGG 
CAT AAC TGC ACC CTT GGT CTC CTC C 

ATG CAA TTA ACC CTC ACT AAA GGG 
AGA GGA CCT GAT TTC CTT ACT GCC TCT 
TGC 

GAC ATG GCG GTG CAG CCG AAG GAG A 



30 HMLH1 



CTA TGT CAA TTG CAA ACA GTG CTC AGT 
TAC AG 

CTT GGC TCT TCT GGC GCC AAA ATG TCG 
TTC 



TAT GTT AAG ACA CAT CTA TTT ATT TAT 
AAT CAA TCC 



Description 

Exon 5 T7 Primer (5' T7 
to p53 3'h 

Exon 5 T3 Primer (5' T3 
to p53 3'). 

Exon 6 T7 Primer <5'T7 
to p53 3'). 

Exon 6 T3 Primer (5'T3 
to p53 3'). 

Exon 7 T7 Primer (5' T7 
to p53 3'). 

Exon 7 T3 Primer (5' T3 
to p53 3'). 

Exon 8 T7 Primer (5' T7 
to p53 3'). 

Exon 8 T3 Primer (5' T3 
to p53 3'). 

Primer for MSH2, 5' to 
3'. If used with MSH2 
primer below, a 3033 
base pair ampficon will 
result 

Primer for hMSH2 5'to 
3'. 

Primer for hMLH1 f 5'to 
3'. If used with hMLH1 
primer below, a 2484 
base pair ampiicon will 
result. 

Primer for hMLH1 5' to 
3'. 
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After PCR amplification of the target amplicon one strand 
of the amplicon can be isolated, i.e., using a biotinylated 
primer that allows capture of the undesired strand on 
streptavidin beads. Alternatively, asymmetric PCR can be used 
5 to generate a single-stranded target. Another approach 

involves the generation of single stranded RNA from the PCR 
product by incorporating a T7 or other RNA polymerase promoter 
in one of the primers. The single-stranded material can 
optionally be fragmented to generate smaller nucleic acids 
10 with less significant secondary structure than longer nucleic 
acids. 

In one such method, fragmentation is combined with 
labeling. To illustrate, degenerate 8-mers or other 
degenerate short oligonucleotides are hybridized to the 

15 single-stranded target material. In the next step, a DNA 

polymerase is added with the four different 

dideoxynucleotides, each labeled with a different fluorophore. 
Fluorophore-labeled dideoxynucleotide are available from a 
variety of commercial suppliers. Hybridized 8-mers are 

20 extended by a labeled dideoxynucleotide. After an optional 
purification step, i.e., with a size exclusion column, the 
labeled 9-mers are hybridized to the chip. Other methods of 
target fragmentation can be employed. The single-stranded DNA 
can be fragmented by partial degradation with a DNAse or 

25 partial depurination with acid. Labeling can be accomplished 
in a separate step, i.e., fluorophore-labeled nucleotides are 
incorporated before the fragmentation step or a DNA binding 
fluorophore, such as ethidium homodimer, is attached to the 
target after fragmentation. 

30 

Exemplary Chips 

a. Exon VI Chip 

To illustrate the. value of the DNA chips of the present 
invention in such a method, a DNA chip was synthesized by the 
35 VLSIPS™ method to provide an array of overlapping probes which 
represent or tile across a 6 0 base region of exon 6 of the p53 
gene. To demonstrate the ability to detect substitution 
mutations in the target, twelve different single substitution 
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,„ild type and three different substitutions at ea=h 
of three Positions, were represented on th „ ch 

serils of " theSS mU " ti0 " S — "P—ted £\ 

5 IT ' * ""^nucleotide probes, which „ er e 

5 complementary to the wild type target except at the onl 

substituted base. Each of the twelve orob« „ 

to a dlfF.™.. ■ ^eive probes was complementary 

base « a li" T" * B » t " i »- «*• 

base " the ' e ' g -' " sutstit «i- ™ at 

base 32, the set of probes would be co»ple»e„tary~with the 
10 exception of base 32~t„ regions of the target 2L32 22^3 
and 32-43,. This enabled investigation of the effect I, "i 
substitution position within the probe. The alignment of „ e 

in TjlT. ™ ' "~ — ^ »"»- «*- - shown 
" To demonstrate the effect of probe length, an additional 

series of ten l„-mer probes was included for each mutation 
(see Figure 28, . In the vicinity of the substituted 
portions, the wild-type sequence was represented by every 
possible overlapping 12-mer and 10-»er probe. To sL^T 
20 comparisons, the probes corresponding ,1 each varieTpos t'ion 
"Ho™:' °" ^ ~*«"*«* regions IZ Z 

subst t"t - rU ° tUre: e ° Ch ** "P— „ts one 

Each 0^ " ith ^ r °" re " r «-"" 9 tb. wild type. 

25 of th! !° ntainS Pr0bM t»-y to the same region 

25 of the target, with probes complementary to the 3 -end J ~ 

coL' ° n ^ rl5ht - di " ere " ~ bet — ^o adjacent 

columns is a single base shift in the positioning of the 

30 IZ " heneVer P ° 5Sible ' the =f "-.or probes were 

Placed in four rows immediately underneath and' aUgned with 
the , rows of 12-mer probes for the same mutation 

To provide model targets, 5 . f luoresceinated 12-mers 
containing all possible substitutions in the first 
codon 1,2 were synthesis (see the starred pos ^ ^ " 
35 target in Figure 27, . solutions containing 10 m target ™ A 
U.-.X SSPE, 0.25, Triton x-loo were hybridised to the c hi p at 
room temperature for several hours. while target rZll 
hybridized to the chip, the fluorophores on thlchip we r I 
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excited by light from an argon laser, and the chip was scanned 
with an autofocusing confocal microscope. The emitted signals 
were processed by a PC to produce an image using image 
analysis software. By 1 to 3 hours, the signal had reached a 
5 plateau; to remove the hybridized target and allow 

hybridization to another target, the chip was stripped with 
60% formamide, 2 X SSPE at 17 'C for 5 minutes. The washing 
buffer and temperature can vary, but the buffer typically 
contains 2-to-3X SSPE, I0-to-60% formamide (one can use 
10 multiple washes, increasing the formamide concentration by 10% 
each wash, and scanning between washes to determine when the 
wash is complete) , and optionally a small percentage of Triton 
X-100, and the temperature is typically in the range of 

15 ^ t0 Very C distinct patterns were observed after hybridization 
with targets with 1 base substitutions and visualization with 
a confocal microscope and software analysis, as shown in 
Figure 29. In general, the probes which form perfect matches 
with the target retain the highest signal. For example, in 
20 the first image, the 12-mer probes that form perfect matches 
with the wild-type (WT) target are in the first row (top) . 
The 12-mer probes with single base mismatches are located in 
the second, third, and fourth rows and have much lower 
signals. The data is also depicted graphically in Figure 30. 
25 On each graph, the X ordinate is the position of the probe in 
its row on the chip, and the Y ordinate is the signal at that 
probe site after hybridization. When a target with a 
different one base substitution is hybridized the 
complementary set of probes has the highest signal (see 
30 pictures 2, 3, and 4 in Figure 29 and graphs 2, 3, and 4 in 
Figure 30). In each case, the probe set with no mismatches 
with the target has the highest signals. Within a 12-mer 
probe set, the signal was highest at position 6 or 7 . The 
graphs show that the signal difference between 12-mer probes 
35 at the same X ordinate tended to be greatest at positions 5 
and 8 when the target and the complementary probes formed 10 
base pairs. and. 11 base pairs, respectively. Because tumors 
often have both WT and mutant p53 genes, mixed target 
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populations were also hybridized to the chip, as shown in 
Figure 31. When the hybridization solution consisted of a l-i 
mixture of WT 12-mer and a 12-mer with a substitution in 
position 7 of the target, the sets of probes that were 
Perfectly matched to both targets showed higher signals than 
tne other probe sets. 

The hybridization efficiency of a 10-mer probe array as 
compared to a 12-mer probe array was also compared The 
10-mer and 12-mer probe arrays gave comparable signals (see 
graphs 1-4 in Figure 30 and graphs 1-4 in Figure 32) 
However, the 10-mer probe sets, which are in rows 5-8 (see 
images in Figure 29), seemed to be better in this model system 
than the 12-mer probe sets at resolving one target from 
another, consistent with the expectation that one base 
mismatches are more destabilizing for 10-mers than 12-mers 
Hybridization results within probe sets perfectly matched to 
target also followed the expectation that, the more matches 
the individual probe formed with the target, the higher the 
signal. However, duplexes with two 3- dangles (see Figure 30 
position 6 in graphs 1-4) have about as much signal as the ' 
probes which are matched along their entire length (see Figure 
3 0 s , position 7, in graphs 1-4). 

This illustrative model system shows that 12-mer targets 
that differ by one base substitutions can be readily 
distinguished from one another by the novel probe array 
provided by the invention and that resolution of the different 
12-mer targets was somewhat better with the 10-mer probe sets 
than with the 12-mer probe sets. 
b. Exon V Chip 
To analyze DNA from exon 5 of the P 53 tumor suppressor 
gene, a set of overlapping 17-mer probes was synthesized on a 
chip. The probes for the WT allele were synthesized so as to 
tile across the entire exon with single base overlaps between 
probes. For each WT probe, a sets of 4 additional probes, one 
for each possible base substitution at position 7 were 
synthesized and placed in a column relative to the WT probe 
Exon 5 DNA was amplified by PGR with primers flanking the 
exon. one of the primers was labeled with fluorescein; the 
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other primer was labeled with biotin. After amplification, 
the biotinylated strand was removed by binding to streptavidin 
beads. The f luoresceinated strand was used in hybridization. 

5 About 1/3 of the amplified, single-stranded nucleic acid 

was hybridized overnight in 5 X SSPE at 60* C to the probe chip 
(under a cover slip) . After washing with 6 X SSPE, the chip 
was scanned using confocal microscopy. Figure 3 3 shows an 
image of the p53 chip hybridized to the target DNA. Analysis 

10 of the intensity data showed that 93.5% of the 184 bases of 
exon 5 were called in agreement with the WT sequence (see 
Buchman et al. , 1988, Gene 7j0: 245 -252 , incorporated herein by 
reference) . The miscalled bases were from positions where 
probe signal intensities were tied (1.6%) and where non-WT 

15 probes had the highest signal intensity (4.9%). Figure 34 

— illustrates how the actual sequence was read. Gaps in Lhe 

sequence of letters in the WT rows correspond to control 
probes or sites. Positions at which bases are miscalled are 
represented by letters in italic type in cells corresponding 

2 0 to probes in which the WT bases have been substituted by other 
bases. 

As the diagram indicates, the miscalled bases are from 
the low intensity areas of the image, which may be due to 
secondary structure in the target or probes preventing 

25 intermolecular hybridization. To diminish the effects due to 
secondary structure, one can employ shorter targets (i.e., by 
target fragmentation) or use more stringent hybridization 
conditions. In addition, the use of a set of probes 
synthesized by tiling across the other strand of a duplex 

30 target can also provide sequence information buried in 
secondary structure in the other strand . It should be 
appreciated, however, that the pattern of low intensity areas 
that forms as a result of secondary structure in the target 
itself provides a means to identify that a specific target 

35 sequence is present in a sample. Other factors that may 

contribute to lower signal intensities include differences in 
probe densities . and hybridization stabilities. 
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n H , T hSSe reSUltS dem ° nStrate the advantages provided by the 

ChipS ° f th£ i^ention to genetic analysis. As another 
example, heterozygous mutations are currently sequenced by an 
arduous process involving cloning and repurif ication of DNA. 
5 The cloning step is required, because the gel sequencing 
systems are poor at resolving even a 1:1 mixture of DNA 
Fxrst, the target DNA is amplified by PGR with primers 
allowmg easy ligation into a vector, which is taken up by 
transformation of £. coli, which in turn must be cultured 
10 typically on plates overnight. After growth of the bacteria 
DNA is purified in a procedure that typically takes about 2 ' 
hours; then, the sequencing reactions are performed, which 
takes at least another hour, and the samples are run on the 
gel for several hours, the duration depending on the length of 
15 the fragment to be sequenced. By contrast, the present 
invention provides direct analysis of the PGR amplified 
material after brief transcription and fragmentation steps 
saving days of time and labor. ' 

20 D- ; Mitochondrial Genome Phi p e 

A human cell may have several hundred mitochondria each 
wxth more than one copy of mtDNA . There is strand asymmetry 
xn the base compositions, with one strand (Heavy) being 

25 TheT^ ! r±Ch/ ° ther St " nd (Light) bei "* C rich- 

mtDNA * " 30 " 9% A ' 31 - 2% C ' 13 " 1% G ' ^ 24 - % T - Human 

mtDNA xs xnformatxon-rich, encoding some 22 tRNAs, i 2S and iss 

rRNAs , and 13 polypeptides involved in oxidative 

phosphorylation. No introns have been detected. rnas are 

processed by cleavage at tRNA sequences, and polyadenylated 

30 postranscriptionally. ln some transcripts, polyadenylation 
also creates the stop codon, illustrating the parsimony of 
codxng. m many individuals, mtDNA can be treated as haploid 
However, some individuals are heteroplasmic (have more than 
one mtDNA sequence) , and the degree of heteroplasmy can vary 

35 from tissue to tissue. Also, the rate of replication of 

ntDNAs can differ and together with random segregation during 
cell division, can lead to changes in heteroplasmy over time 
The human mitochondrial genome is 16,569 nucleotides 
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long. The sequence of the L-strand is numbered arbitrarily 
from the MboI-5/7 boundary in the D-loop region. The complete 
sequence of the human mitochondrial genome has been published. 
See Anderson et al., Nature 290, 457-465 (1981). 
5 Mitochondrial DNA is maternally inherited, and has a mutation 
rate estimated to be tenfold higher than single copy nuclear 
DNA (Brown et al., Proc . Natl. Acad. Sci . USA 76, 1967-1971 
(1979)). Human mtDNAs differ, on average, by about 70 base 
substitutions (Wallace, Ann. .Rev. Biochem. 61, 1175-1212 
10 (1992)). Over 80% of substitutions are transitions (i.e., 
pyrimidine-pyrimidine or purine-purine) . 

Analysis of mitochondrial DNA serves several purposes. 
Detection of mutations in the mitochondrial genome allows 
diagnosis of a number of diseases. The mitochondrial genome 
15 has been identified as the locus of several mutations 

associated with human diaudt,^. S o me of the mutations r e sult 

in stop codons in structural genes. Such mutations have been 
mapped and associated with diseases, such as Leber's 
hereditary optic neuropathy, neurogenic muscular weakness, 
20 ataxia and retinitis pigmentosa. Other mutations (nucleotide 
substitutions) occur in tRNA coding sequences, and presumably 
cause coritormational be'reets in transcribed tRNA molecules. 
Such mutations have also been mapped and associated with 
diseases such as Myoclonic Epilepsy and Ragged Red Fiber 
25 Disease. Another type of mutation commonly found is deletions 
and/or insertions. some deletions span segments of several 
kb. Again, such mutations have been mapped and associated 
with diseases, for example, ocular myopathy and Person 
Syndrome. See Wallace, Ann. J?ev. Biochem. 61-1175-1212 (1992) 
30 (incorporated by reference in its entirety for all purposes) . 
Early detection of such diseases allows metabolic or genetic 
therapy to be administered before irretrievable damage has 
occurred. Id. Analysis of mitochondrial DNA is also 
important for forensic screening. Because the mitochondrial 
3 5 genome is a locus of high variability between individuals, 

sequencing a substantial length of mitochondrial DNA provides 
a fingerprint that is highly specific to an individual. 
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Analysis of Mitochondrial DNA is also important for 
evolutionary and epidemiological studies 

The reference sequence can be an entire mitochondrial 
genome or any fragment thereof. For forensic and 
epidemiological studies, the reference sequence is often all 
=r part of the D -loop region in which variability between 
individuals is greatest (e.g.. from 1.024-1.401 and „™ , 

a reference sequence, but shorter segments including 

are 11 JT* «* ^ «0 flanging 

are also useful. some chips have probes tiling paired 

reference sequences, representing wildtype and mutant versions 
=f a sequence. Tiling a second reference sequence is 
particularly useful for detecting an insertion mutation 
occurring „ ,.. 50% of OC ular „ yopathy ^ 

ACCTCCCTCACCA. Some chips include reference sequences from 
more than one mitochondrial genome. 

of ttetlTT™ re£SrenCe -I""— «» be tiled using any 
of the strategies noted above. The block tiling strateg! is 
particularly useful for analy.ing short reference sequences or 
Known mutations. Either the bloc, strategy or the bLc 
strategy is suitable for analyzing lo „g reference sequences 
in many of the tiling strategies, it is possible to use fe„ er 

most point mutations i„ mitochondrial DNA are transitions' so 
for each wildtype nucleotide in a reference sequence, one of 

"*eiy r t„" P °;r ble " UCle ° tide -"stitutions is much more 
likely than the other two. Accordingly, in the basic tiling 
strategy, for example, a reference sequence can be tiled using 
only two probe sets. One probe sets comprises a plurality of 
probes each probe having . segment exa=tiy IJ^?'* 

the reference sequence. The second probe set comprises I 
corresponding probe for each probe in the first set. However 
a probe from the second probe set differs from the 
corresponding probe from the first probe set in an 
interrogation position, in which the probe fro, the second 
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probe set includes the transition of the nucleotide present in 
that position in the probe from the first probe set. 

Target mitochondrial DNA can be amplified, labelled and 
fragmented prior to hybridization using the same procedures as 
described for other chips. Use of at least two labelled 
nucleotides is desirable to achieve uniform labelling. Some 
exemplary primers are described below and other primers can be 
designed from the known sequence of mitochondrial DNA. 
Because mitochondrial DNA is present in multiple copies per 
cell, it can also be hybridized directly to a chip without 
prior amplification. 



PvoTTipl^ry Chips 

The invention provides a DNA chip for analyzing sequences 
15 contained in a 1.3 kb fragment of human mitochondrial DNA from 

the "D-loop" region, the most pulymuiphic region of human 

mitochondrial DNA. One such chip comprises a set of 2 69 
overlapping oligonucleotide probes of varying length in the 
range of 9-14 nucleotides with varying overlaps arranged in 
-6 00 x 600 micron features or synthesis sites in an array 1 cm 
x 1 cm in size. The probes on the chip are shown in columnar 
form below. An illustrative mitochondrial DNA chip of the 
invention comprises the following probes (X, Y coordinates are 
shown, followed by the sequence; "DL3» represents the 3 '-end 
of the probe, which is covalently attached to the chip 
surface. ) 
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0 0 DL3 AGTGGGGTATTT 

1 0 DL3 GGGTATTTAGTT 

2 0 DL3TTAGTTTATCCAA 

3 0 DL3ATCCAAACCAGG 

4 0 DL3ACCAGGATCGGA 

5 0 DL3 CGTGTGTGTGTGG 

6 0 DL3 CGTGTGTGTGTGGC 

7 0 DL3TCGTGTGTGTGTGG 

8 0 DL3 GTAGGATGGGTC 

9 0 DL3AGGATGGGTCGT 

10 0 DL3 GATGGGTCGTGT 

11 0 DL3 TGGCGACGATTG 

12 0 DL3 GCGACG ATTGGG 

13 0 DL3 TGGGGGGG A 

14 0 DL3 GAGGGGGCG 

15 0 DL.3 GGAGGGGGCGA 

16 0 DL3 GAGGGGGCG A 
0 1 DL3 GGCTTGGTTGG 



1 1 DL3GGTTGGTTTGGG 

2 1 DL3TGGGGTTTCTAG 

3 1 DL3 GTTTCTAGTGGG 

4 1 DL3AGTGGGGGGTGT 

5 1 DL3GGGGTGTCAAAT 

6 1 DL3 GTCAAATACATCG 

7 1 DL3 ACATCGAATGGAG 

8 1 DL3 CGAATGGAGGAG 

9 1 DL3GAGGAGTTTCGT 

10 1 DL3 TTTCGTTATGTGA 

11 1 DL3ATGTGACTTTTAC 

12 1 DL3GACTTTTACAAAT 

13 1 DL3AAATCTGCCCGA 

14 1 DL3AATCTGCCCGAG 

15 1 DL3 CCCGAGTGTAGT 

16 1 DL3AGTGTAGTGGGG 

0 2 DL3GGGAGGGTGAG 

1 2 DL3GGTGAGGGTATG 
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3 I DIi3GGTATGATGATTAG 

n ^ DL3 GGATGGGTCGTr 
11 1 ^3GGTCGTGTGTGT 
" 2 DL3GTGTGTGTGGCG 

3 DL3GGATTGGTc?Sa 

3 ° L3 tctaaagtttaaJ 

3 DL3 GTTTAAAATAGAA 
3 DL3ATAGAAAAACCG 
3 DL3AGAAAAA^G? 
3 DL3AACCGCCATAC 
3 DL3 CCATACGTGAAAA 
3 DL3ACGTGAAAATTGT 

4 DL3GGAGGGGGCG 
4 SS GGGGCG AAGAC 
4 S^ G ^ GACCGG ATG 
* P CCGGATGTCGTG 

? DL3CGTGAATTTGTGT 

* ^3ACGGTTTGGGG 
DL3TGGGGTTTTTGT 
DL3GGGTTTTTGTTT 
DL3TTGTTTCTTGGG 
DL3 TCTTGGGATTGTr 

^3TGTATGA^?GA??T 

DL3TGATTTCACACAA 

S^CAATTA^^ 

DL3 AATTAATTACGAA 
DL3TACGAACAT? GAA 

DL3ACGAACATCCTGT 
DL3TCCTGTATTATTA 
D L3 GTATTATTATTGTT 
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0 
1 
2 

3 8 

4 8 

5 8 



i f S t l , 3cctcg tgggat? 

° DL3GA CGGAGTAGra 
f f DL3G TAGGATAA? GA a 

0 7 DL3 GTGAGTGCCCTC 

3 7 S^ CCTCGAG AGGTA 

4 7 n^ 3AGAGGT ACGTAA 

* 7 dlsacgtaaaccatJ 

6 y S^f ataaa ^ g Sg 

* 7 D^ CCAT ^ 

10 7 £r L3GATACG rGCGCT 

11 7 S 3 S GCGCTATC AG 

14 7 n^ 3GTAACGG TCTC? 
16 -7 £t GACCTC GGCCT 



DL3 TCGGCCTCG'rir 
! P5* 3 GA ^ G AAGTCCCAG 
DL3 GTATTTCCCiiimI 



6 I n^ ATCGGGTG TGCA 

7 I m 3 ^ GTGCAAGG GGA 
8 DL3CAAGGGGAATTT 
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,4 8 DL3AGATAGTGGGATA 

8 DL3GGGATAATTGGT 
H I DL3TAATTGGTGAGTG 
I 6 % DL3TATAGGGCGTGT 
? I DL3GGCGTGTTCTCA 
\ I DL3GTGTTCTCACGAT 
? 9 DL3TCACGATGAGAGG 
I 9 DL3ATGAGAGGAGCG 
\ 9 DL3AGGAGCGAGGC 
I 9 DL3CGAGGCCCGG 
I I DL3GCCCGGGTATT 
I I DL3CGGGTATTGTGA 
l I DL3GTGAACCCCCAT 
0 1 DL3CCCCATCGATTT 
J? I DL3ATCGATTTCACTT 
\\ 9 DL3TTTCACTTGACAT 
I? I SL3TTGACATAGAGCT 
\l I DL3 TAGAGCTGTAGAC 
15 9 DL3GTAGACCAAGGA 
ll 9 DL3ACCAAGGATGAAG 
n in DL3CGTGTAATGTCAG 
? \°o g^TGTCAGTTTAGGG 
\ \o DL3TCAGTTTAGGGA 
10 DL3TAGGGAAGAGCA 
1U tl^ n » inane AGGGGT 
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8 12 DL3TGTTCGTTCATGT 
I H DL3CGTTCATGTCGTT 
10 12 DL3GTCGTTAGTTGG 
1? 12 DL3TAGTTGGGAGTT 

12 12 DL3GGAGTTGATAGTG 

13 12 D L 3 AT AGTGTGT AGTT 

14 12 DL3GTGTAGTTGACGT 

1 5 12 DLBTGACGTT^T 
16 12 DL3CGTTGAGGTTTA 
5 13 DL3TATAACATGCCAT 
| \l DL3AACATGCCATGGT 
7 13 DL3CCATGGTATTTAT 
I 13 DL3ATTTATGAACTGG 
I 13 DL3AACTGGTGGACAT 
10 13 DL3 TGG ACATCATGTA 
H 13 DL3CATGTATTTTTGG 
12 13 DL3TTTTGGGTTAGG 
S 13 DL3GGGTTAGGATGT 
14 13 DL3GGATGTAGTTTTG 
\l 13 DL3TGTAGTTTTGGG 
16 13 DL3TTTGGGGGAGG 

5 14 DL3GGGTTCATAACTG 
5 14 DL3ATAACTGAGTGGG 
14 UL3AACTGAGTGGGT 
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10 DL3TAGG^AA^««-;" 
4 io DL3 AAG AG C AGGGGT 
t 10 DL3CAGGGGTACCTA 
| iS 5L3GGTACCTACTGG 
% io DL3TACTGGGGGGA 
I io DL3GGGGGAGTCTAT 
| i? DL3AGTCTATCCCCA 

10 10 DL3ATCCCCAGGGA 
\l lo DL3CAGGGAACTGGT 
12 10 DL3ACTGGTGGTAGG 
H io DL3CTGGTGGTAGGA 

11 io DL3GTAGGAGGCACA 
^ \o DL3GGCACATTTAGT 
11 10 DL3TTTAGTTATAGGG 
I il DL3AGGTTTACGGTG 
? ii DL3TACGGTGGGGA 

5 il DL3GTGGGGAGTGG 
I ii DL3GGGAGTGGGTGA 
4 ii DL3GGGTGATCCTATG 
t ii DL3CCTATGGTTGTTT 
I ^ DL3GGTTGTTTGGATG 
7 ii DL3GTTTGGATGGGT 
I ^ DL3ATGGGTGGGAAT 

I li DL3GGGAATTGTCATG 

n ii SSgtcatgtatcatgt 

II ii SStcatgtatttcgg 

ii il DL3TATTTCGGTAAA 

\\ ^ DL3TTCGGTAAATGG 

11 ii DL3GTAAATGGCATGT 
i 5 ^ SL3GCATGTAATCGTG 
is ^ DL3GTAATCGTGTAAT 

I i 2 2 SSSSSKK* 

6 7 " 2 DWACGAATGTTCGTT 



8 \\ DL3 GTGGGTAGTTGT 
1 il DL3GTAGTTGTTGGC 
10 14 DL3 GTTGGCG ATACA 
H il DL3 CGATACATAAAAG 
iz 14 DL3TAAAAGCATGTAA 
ia il DL.3 GCATGTAATGACG 
l\ il DL3ATGACGGTCGGT 
it il DL3GTCGGTGGTACT 
i| il DL 3GGTACTTATAACA 
5 6 is DL3 TCG ATTCTAAGAT 
\ ii DL3TAAGATTAAATTT 
5 il DL3AAATTTGAATAAG 
8 is DL.3 AAT AAG AGAC AAG 
o 15 DL.3AAGAGACAAGAAA 

10 15 DL.3AAGAAAGTACCC 

11 is DL3AAAGTACCCCTT 
iz il DL.3 CCCCTTCGTCTA 
13 15 DL3CTTCGTCTAAAC 
iJ 15 DL3CTAAACCCATGG 
is il DL3AACCCATGGTGG 
il is DL3TGGTGGGTTCAT 
5 il DL3TTGGAAAAAGGT 
| 16 DL3AAAAGGTTCCTG 
7 16 DL3GGTTCCTGTTTA 

c 16 DL3CCTGTTTAGTCTC 
% 16 DL3TIAGTCTCTTTTT 

10 16 DL3 CTTTTTC AGAAAT 

11 16 DL3AGAAATTGAGGTG 
ii il DL3AAATTGAGGTGGT 
" il DL3GGTGGTAATCGT 
il il DL3TAATCGTGGGTT 
is ie DL3GTGGGTTTCGAt 

16 16 DL3GGTTTCGATTCT 
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No probes were present in positions X, Y = o 12 to y v 
12; X, Y = 0, 13 to X, Y = 4 13 • Y V n ' = 

14- v v ' Y = 0, 14 to X, Y = 4 

«. x. v - 0, 15 to x, y . 4 , 15; x> Y . 0f it ^ x - y _ «. 

The length of each of th. probes on the chip was variable t 
»inimize differences in melting temperature and P ™.T f 
cross-hybridization. Zach position in the sequence „a s 
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"~ * « — one probe ana m t po" ^e 
represented by 2 or .ore probes. A s noted Love "a 
of overlap between the oligonucleotides varied from T 

?:r h L H pi r r Y 5 shous the ^^z^rv: 
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DNA was prepared from hair roots of i v k 
«» and then amplified by PCR and cloned ^"^"Z 

verify that the desired specific sequences were present „» 
from the sequenced „13 clones was amplified by PCR 
transcribed in vitro, and labeled with fiuorL. 

polymerase. The i.3 Kb RNA J^n^^T;", 
and hybridized to th. chip. The results showed that I T 
different individual had DNA that produced a unique "* 
hybridization fingerprint on the chip and th« t! - • 
ih the observed patterns could be o.^^^^"^ 
in the cloned genomic DNA sequence. The results also 
demonstrated that very long sequences of a target L • ■ 
can be represented comprehensively as a spe= lf L sL of 
overlapping dligonuc-ledfioes an* that arrays of such orob. 
sets can be usefully applied to genetic analysis 

The sample nucleic acid was hybridized to the chio in 
solution composed of 6 X SSPE. o.l. Triton-x loo for 60 " 
.inutes at „. c . Th . chip „ as t „ en s=anned 

scanning fluorescence microscopy. The individual features 
the chip were 58 s * 538 microns, but the !ower left s ^ ™ 
square features in the array did not contain probed To 
cmantitate th. data, pixel counts were measurL li- 
synthesis site. Pixels represent so „ IIZZLTZ^ 
fluorescence intensity for each feature was scaled to a mean 
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determined from 27 bright features. After scanning, the chip 
was stripped and rehybridized ; all six samples were hybridized 
to the same chip. Figure 3 6 shows the image observed from the 
mt4 sample on the DNA chip. Figure 3 7 shows the image 
5 observed from the mt5 sample on the DNA chip. Figure 38 shows 
the predicted difference image between the mt4 and mt5 samples 
on the DNA chip based on mismatches between the two samples 
and the reference sequence (see Anderson et al., supra). 
Figure 39 shows the actual difference image observed. 
10 The results show that, in almost all cases, mismatched 

probe/target hybrids resulted in lower fluorescence intensity 
than perfectly matched hybrids. Nonetheless, some probes 
detected mutations (or specific sequences) better than others, 
and in several cases, the differences were within noise 
15 levels. Improvements can be realized by increasing the amount 
of overlap between probes and hence overall prooe density ana, 
for duplex DNA targets, using a second set of probes, either 
on the same or a separate chip, corresponding to the second 
strand of the target. Figure 40, in sheets 1 and 2, shows a 
20 plot of normalized intensities across rows 10 and 11 of the 
array and a tabulation of the mutations detected. 

Figure 41 shows the discrimination between wild-type and 
mutant hybrids obtained with this chip. The median of the six 
normalized hybridization scores for each probe was taken. The 
25 graph plots the ratio of the median score to the normalized 
hybridization score versus mean counts. On this graph, a 
ratio of 1.6 and mean counts above 50 yield no false 
positives, and while it is clear that detection of some 
mutants can be improved, excellent discrimination is achieved, 
3 0 considering the small size of the array. Figure 42 

illustrates how the identity of the base mismatch may 
influence the ability to discriminate mutant and wild-type 
sequences more than the position of the mismatch within an 
oligonucleotide probe. The mismatch position is expressed as 
35 % of probe length from the 3 '-end. The base change is 

indicated on the graph. These results show that the DNA chip 
increases the capacity of the standard reverse dot blot format 
by orders of magnitude, extending the power of that approach 
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many fold and that the methods of the invention are more 
efficient and easier to automate than gel-based methods of 
nucleic acid sequence and mutation analysis. 

To illustrate further these advantages, a second chip was 
5 prepared for analyzing a longer segment from human 

mitochondrial DNA (mtDNA) . The chip "tiles- through 648 
nucleotides of a reference sequence comprising human H strand 
mtDNA from positions 16280 to 356, and allows analysis of each 
nucleotide in the reference sequence. The probes in the array 
10 are 15 nucleotides in length, and each position in the target 
sequence is represented by a set of 4 probes (A, C, G, T 
substitutions), which differed from one another at 'position 7 
from the 3 '-end. The array consists of 13 blocks of 4 x 50 
probes: each block scans through 50 nucleotides of contiguous 
15 mtDNA sequence. The blocks are separated by blank rows. The 
4 corner columns contain control probes; there are a total of 
2600 probes in a 1.28 cm x 1.28 cm square area (feature), and 
each area is 256 x 197 microns. 

Target RNA was prepared as above. The RNA was fragmented 
and hybridized to the oligonucleotide array in a solution 
composed of 6X SSPE, 0.1% Triton X-100 for 60 minutes at 18 »C. 
Unhybridized material was washed away with buffer, and the 
chip was scanned at 25 micron pixel resolution. 

Figure 43 provides a 5 ' to 3 • sequence listing of one 
25 target corresponding to the probes on the chip. X is a 

control probe. Positions that differ in the target (i e are 
mismatched with the probe at the designated site) are in bold 
Figure 44 shows the fluorescence image produced by scanning 
the chip when hybridized to this sample. About 95% of the 
30 sequence could be read correctly from only one strand of the 
original duplex target nucleic acid. Although some probes did 
not provide excellent discrimination and some probes did not 
appear to hybridize to the target efficiently, excellent 
results were achieved. The target sequence differed from the 
35 probe set at six positions: 4 transitions and 2 insertions. 
All 4 transitions were detected, and specific probes could 
readily be incorporated into the array to detect insertions or 
deletions. Figure 45 illustrates the detection of 4 
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transitions in the target sequence relative to the wild-type 

probes on the chip. 

A further chip was constructed comprising probes tiling 
across the entire D-loop region (1.3 kb) of mt DNA sequences 
5 from two humans. The probes were tiled in rows of four using 
the basic tiling strategy. The probes were overlapping 15 
mers having an interrogation position 7 nucleotides from the 
3' end. The complete group of probes tiled on the reference 
sequence from the first individual, designated mtl, occupied 

10 the upper half of the chip. The lower half of the chip 

contained a similar arrangement based on a second clone, mt2 . 
The probes were synthesized in a 1.28 x 1.28 cm area, which 
contained a matrix of 115 x 12 0 cells. The chip contained a • 
total of 10,488 mtDNA probes. 

15 Six samples of target DNA was extracted form hair roots 

from six individuals. The 1.3 Kb region spanning positions 
1593 5 to 667 of human mtDNA was PCR amplified, cloned in 
bacteriophage M13 and sequenced by conventional methods. The 
1.3 kb region was reamplified from the phage clone using 

20 primers, L15935-T3 , 

5 1 CTCGGAATTAACCCTCACTAAAGGAAACCTTTTTCCAAGGA and H667-T7, 
5 * TAATACGACTCACTATAGGGAGAGGCTAGGACCAAACCTATT tagged with T3 
and T7 RNA polymerase promoter sequences. Labelled RNA was 
generated by in vitro transcription using T3 RNA polymerase 
25 and f luoresceinated nucleotides, fragmented, and hybridized to 
the mtDNA control region resequencing chip at room temperature 
for 60 min, in 6xSSPE + 0.05% triton X-100. Six washes were 
carried out at room temperature, using 6xSSPE + 0.005% triton 
X-100, and the chip was read. Signal intensities varied 
3 0 considerably over the chip, but the large dynamic range of the 
detection system allowed accurate quantitation of intensities 
over several orders of magnitude. Even relatively low signal 
intensities yielded accurate results. 

Five different clones (mtl-5) were hybridized, each to a 
3 5 separate chip. The reference sequence was also hybridized for 
comparative purposes. Mean counts per probe cell were 
determined,- and used by automated basecalling software to read 
the sequence. The accuracy of sequence read from the chip is 
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summarized as follows. Combining the data from the five 
targets analyzed, the chip read a total of 6310 nucleotides. 
Of these nucleotides in the target sequences, 55 were 
different from the reference sequence (as judged by 
5 conventional sequencing) (41 of these 55 nucleotides were both 
detected and read correctly from the chip) . 6 of 55 
nucleotides were detected as being ambiguous but their 
identity could not be read. 2 of 55 nucleotides were detected 
as mutations, but their identity was miscalled. 6 of 55 
10 nucleotides were incorrectly called as wildtype. of the 6255 
nucleotides in the target sequence that were identical to the 
reference sequence, only 36 (0.57%) were miscalled or scored 
as ambiguous. 

A further chip was constructed comprising probes tiling 
15 across a reference sequence comprising an entire mitochondrial 
genome. In this chip, a block tiling strategy was used. Each 
block was designed to analyze seven nucleotides from a target 
sequence. Each block consisted of four probe sets, the probe 
sets each having seven probes. A block was laid down on the 
2 0 chip in seven columns of four probes. The upper probe was the 
same in each column, this being a probe exactly complementary 
to a subsequence of the reference sequence. The three other 
probes in each column were identical to the upper probe except 
in an interrogation position, which was occupied by a 
25 different base in each of the four probes in the column. The 
interrogation position shifted by one position between 
successive columns. Thus, except for the seven interrogation 
positions, one in each of the columns of probes, all probes 
occupying a block were identical. The array comprised many 
30 such blocks, each tiled to successive subsequences of the 

mitochondrial DNA reference sequence. In all, the chip tiled 
15,569 nucleotides of reference sequence with double tiling at 
42 positions. 66,276 probes occupied an array of 304 x 315 
cells, each cell having an area of 42 x 41 microns. 
35 The chip was hybridized to the same target sequences as 

described for the D-loop region, except that hybridization was 
at 15 °C for 2 hr. The chip was scanned at 5 micron resolution 
to give an image with approximately 64 pixels per cell. For- 
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blocks of probes tiling across the D-loop region, a sequence- 
specific hybridization pattern was obtained. For other 
blocks, only background hybridization was observed. 

These results illustrate that longer sequences can be 
5 read using the DNA chips and methods of the invention, as 
compared to conventional sequencing methods, where reading 
length is limited by the resolution of gel electrophoresis. 
Hybridization and signal detection require less than an hour 
and can be readily shortened by appropriate choice of buffers, 
10 temperatures, probes, and reagents. 

III. MODES OF PRACTICING THE INVENTION 
A. VLSTPS™ Technology 

As noted above, the VLSIPS™ technology is described in a 
15 number of patent publications and is preferred for making the 
oligonucleotide arrays of the invention. A brief description 
of how this technology can be used to make and screen DNA 
chips is provided in this Example and the accompanying 
Figures. In the VLSIPS™ method, light is shone through a mask 
20 to activate functional (for oligonucleotides, typically an 

-OH) groups protected with a photoreroovable protecting group 
on a surface of a solid support. After light activation, a 
nucleoside building block, itself protected with a 
photoremovable protecting group (at the 5 '-OH), is coupled to 
25 the activated areas of the support. The process can be 
repeated, using different masks or mask orientations and 
building blocks, to prepare very dense arrays of many 
different oligonucleotide probes. The process is illustrated 
in Figure 46; Figure 47 illustrates how the process can be 
30 used to prepare "nucleoside combinatorials 1 ' or 

oligonucleotides synthesized by coupling all four nucleosides 
to form dimers, trimers and so forth. 

New methods for the combinatorial chemical synthesis of 
peptide, polycarbamate, and oligonucleotide arrays have 
35 recently been reported (see Fodor et al., 1991, Science 251: 
767-773; Cho et al., 1993, Science 261: 1303-1305; and 
Southern et al., 1992, Genomics 13: 1008-10017, each of which 
is incorporated herein by reference). These arrays, or 
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biological chips (see Fodor et al., 1993, Nature 364: 555-556 
incorporated herein by reference) , harbor specific chemical 
compounds at precise locations in a high-density, information 
rich format, and are a powerful tool for the study of 
biological recognition processes. a particularly exciting 
application of the array technology is i n the field of DNA 
sequence analysis. The hybridization pattern of a DNA target 
to an array of shorter oligonucleotide probes is used to gain 
primary structure information of the DNA target. This format 
has important applications in sequencing by hybridization, DNA 
diagnostics and in elucidating the thermodynamic parameters 
affecting nucleic acid recognition. 

Conventional DNA sequencing technology is a laborious 
procedure requiring electrophoretic size separation of labeled 
DNA fragments. An alternative approach, termed Sequencing By 
Hybridization (SBH) , has been proposed (Lysov et al., i 98 8, 
Dokl. Akad. Nauk SSSR 303:1508-1511; Bains et al. , 1988, J 
Theor. Biol. 135:303-307; and Drmanac et al., i 989 , Genomics 
4:114-128, incorporated herein by reference). This method 
uses a set of short oligonucleotide probes of defined sequence 
to search for complementary sequences on a longer target 
strand of DNA. The hybridization pattern is used to 
reconstruct the target DNA sequence. It is envisioned that 
hybridization analysis of large numbers of probes can be used 
25 to sequence long stretches of DNA. In immediate applications 
of this hybridization methodology, a small number of probes 
can be used to interrogate local DNA sequence. 

The strategy of SBH can be illustrated by the following 
example. A 12-mer target DNA sequence, AGCCTAGCTGAA, is mixed 
with a complete set of octanucleotide probes. If only perfect 
complementarity is considered, five of the 65,536 octamer 
probes -TCGGATCG, CGGATCGA, GGATCGAC , GATCGACT, and ATCGACTT 
will hybridize to the target. Alignment of the overlapping 
sequences from the hybridizing probes reconstructs the 
3 5 complement of the original 12-mer target: 

TCGGATCG 
CGGATCGA 
GGATCGAC 
40 GATCGACT 
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ATCGACTT 
TCGGATCGACTT 

Hybridization methodology can be carried out by attaching 
5 target DNA to a surface. The target is interrogated with a 

set of oligonucleotide probes , one at a time (see Strezoska et 
al., 1991, Proc. Natl. Acad. Scl . USA 88:1008 9-10093, and 
Drmanac et al., 1993, Science 260:1649-1652, each of which is 
incorporated herein by reference) . This approach can be 
10 implemented with well established methods of immobilization 
and hybridization detection, but involves a large number of 
manipulations. For example, to probe a sequence utilizing a 
full set of octanucleotides, tens of thousands of 
hybridization reactions must be performed. Alternatively, SBH 
15 can be carried out by attaching probes to a surface in an 

array format where the identity of the probes at each site is 
known- The target DNA is then added to the array of probes. 
The hybridization pattern determined in a single experiment 
directly reveals the identity of all complementary probes. 
20 As noted above, a preferred method of oligonucleotide 

probe array synthesis involves the use of light to direct the 
synthesis of oligonucleotide probes in high-density, 
miniaturized arrays. Photolabile 5 1 -protected 
'N-acyi-cieoxynucxeosVcie pnospnorami'd'ites , surface linker 
25 chemistry, and versatile combinatorial synthesis strategies 
have been developed for this technology. Matrices of 
spatially-defined oligonucleotide probes have been generated, 
and the ability to use these arrays to identify complementary 
sequences has been demonstrated by hybridizing fluorescent 
30 labeled oligonucleotides to the DNA chips produced by the 

metnocis. "The "hybridization pattern demonstrates a high degree 
of base specificity and reveals the sequence of 
oligonucleotide targets. 

The basic strategy for light-directed oligonucleotide 
35 synthesis (1) is outlined in Fig. 46. The surface of a solid 
support modified with photolabile protecting groups (X) is 
illuminated through a photolithographic mask, yielding 
reactive hydroxyl groups in the illuminated regions. A 
3 i -o-phosphoramidite activated deoxynucleoside (protected at 
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the 5'-hydroxyl with a photolabile group) is then presented to 
the surface and coupling occurs at sites that were exposed to 
light. Following capping, and oxidation, the substrate is 
rinsed and the surface illuminated through a second mask, to 
5 expose additional hydroxyl groups for coupling. A second 
5 -protected, 3 - -O-phosphoramidite activated deoxynucleoside 
xb presented to the surface. The selective photodeprotection 
and coupling cycles are repeated until the desired set of 
products is obtained. 
10 Light directed chemical synthesis lends itself to highly 

efficient synthesis strategies which will generate a maximum 
number of compounds in a minimum number of chemical steps 
For example, the complete set of 4" polynucleotides (length 
n) , or any subset of this set can be produced in only 4 x n 
15 chemical steps. See Fig . 47 . The patterns of illumination 
and the order of chemical reactants ultimately define the 
products and their locations. Because photolithography is 
used, the process can be miniaturized to generate high-density 
arrays of oligonucleotide probes. For an example of the 
nomenclature useful for describing such arrays, an array 
containing all possible octanucleotides of dA and dT is 
written as (a + T)*. Expansion of this polynomial reveals the 
identity of all 256 octanucleotide probes from AAAAAAAA to 
TTTTTTTT . A DNA array composed of complete sets of 
dinucleotides is referred to as having a complexity of 2. The 
array given by (A+T+C+G) 8 is the full 65,53 6 octanucleotide 
array of complexity four. Computer-aided methods of laying 
down predesigned arrays of probes using VLSIPS™ technology are 
described in commonly-assigned co-pending application USSN 
08/249,188, filed May 24, 1994 (incorporated by reference in 
its entirety for all purposes) . 

To carry out hybridization of DNA targets to the probe 
arrays, the arrays are mounted in a thermostatically 
controlled hybridization chamber. Fluorescein labeled DNA 
targets are injected into the chamber and hybridization is 
allowed to proceed for 5 min to 24 hr. The surface of the 
matrix is scanned in an epif luorescence microscope (Zeiss 
Axioscop 20) equipped with photon counting electronics using 
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50 100 >iV? of 488 nm excitation from an Argon ion laser 
(Spectra Physics Model 2020) . Measurements may be made with 
the target solution in contact with the probe matrix or after 
washinq.. Photon counts are stored and imaqe files are 
5 presented after conversion to an eight bit image format. See 
Fig. 51. 

When hybridizing a DNA target to an oligonucleotide 
array, N = Lt-(Lp-l) complementary hybrids are expected, where 
N is the number of hybrids, Lt is the length of the DNA 

10 target, and Lp is the length of the oligonucleotide probes on 
the array. For example, for an ll-mer target hybridized to an 
octanucleotide array, N = 4 . Hybridizations with mismatches 
at positions that are 2 to 3 residues from either end of the 
probes will generate detectable signals. Modifying the above 

15 expression for N, one arrives at a relationship estimating the 
number of detectable hybridizations (Nd) for a DNA target of 
length Lt and an array of complexity C. Assuming an average 
of 5 positions giving signals above background: 
Nd = (1 + 5(C-1) ) [Lt-(Lp-l) ] . 

20 Arrays of oligonucleotides can be efficiently generated 

by light-directed synthesis and can be used to determine the 
identity of DNA target sequences. Because combinatorial 
strategies are used, the number of compounds increases 
exponentially while the number of chemical coupling cycles 

25 increases only linearly. For example, synthesizing the 

complete set of 4 s (65,536) octanucleotides will add only four 
hours to the synthesis for the 16 additional cycles. 
Furthermore, combinatorial synthesis strategies can be 
implemented to generate arrays of any desired composition. 

3 0 For example, because the entire set of dodecamers (4 12 ) can be 
produced in 48 photolysis and coupling cycles (b n compounds 
requires b x n cycles) , any subset of the dodecamers 
(including any subset of shorter oligonucleotides) can be 
constructed with the correct lithographic mask design in 48 or 
35 fewer chemical coupling steps. In addition, the number of 
compounds in an array is limited only by the density of 
synthesis sites and the overall array size. Recent 
experiments have demonstrated hybridization to probes 
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synthesized in 25 M m sites. At this resolution, the entire 
set of 65,536 octanucleotides can be placed in an array 
measuring 0.64 cm square, and the set of 1,048,576 
dodecanucleotides requires only a 2.56 cm array. 
5 Genome sequencing projects will ultimately be limited by 

DNA sequencing technologies. Current sequencing methodologies 
are highly reliant on complex procedures and require 
substantial manual effort. Sequencing by hybridization has 
the potential for transforming many of the manual efforts into 
10 more efficient and automated formats. Light-directed 

synthesis is an efficient means for large scale production of 
miniaturized arrays for SBH. The oligonucleotide arrays are 
not limited to primary sequencing applications. Because 
single base changes cause multiple changes in the 
hybridization pattern, the oligonucleotide arrays provide a 
powerful means to check the accuracy of previously elucidated 
DNA sequence, or to scan for changes within a sequence. in 
the case of octanucleotides, a single base change in the 
target DNA results in the loss of eight complements, and 
20 generates eight new complements. Matching of hybridization 
patterns may be useful in resolving sequencing ambiguities 
from standard gel techniques, or for rapidly detecting DNA 
mutational events. The potentially very high information 
content of light-directed oligonucleotide arrays will change 
25 genetic diagnostic testing. Sequence comparisons of hundreds 
to thousands of different genes will be assayed simultaneously 
xnstead of the current one, or few at a time format. custom 
arrays can also be constructed to contain genetic markers for 
the rapid identification of a wide variety of pathogenic 
3 0 organisms. 

Oligonucleotide arrays can also be applied to study the 
sequence specificity of rna or protein-DNA interactions. 
Experiments can be designed to elucidate specificity rules of 
non Watson-Crick oligonucleotide structures or to investigate 
the use of novel synthetic nucleoside analogs for antisense or 
triple helix applications. Suitably protected RNA monomers 
may be employed. for RNA synthesis. The oligonucleotide arrays 
should find broad application deducing the thermodynamic and 
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kinetic rules governing formation and stability of 
oligonucleotide complexes . 

Other than the use of photoremovable protecting groups, 
the nucleoside coupling chemistry is very similar to that used 
5 routinely today for oligonucleotide synthesis. Fig. 48 shows 
the aeprbtecrion , coupling, and oxibat'ion steps o"f a sdl'i'd 
phase DNA synthesis method. Fig. 49 shows an illustrative 
synthesis route for the nucleoside building blocks used in the 
method. Fig. 50 shows a preferred photoremovable protecting 
10 group, MeNPOC, and how to prepare the group in active form. 
The procedures described below show how to prepare these 
reagents. The nucleoside building blocks are 
5 1 — MeNPOC-THYMIDINE— 3 ' -OCEP ; 5 ' -MeNPOC-N 4 -t-BUTYL 
PHENOXYACETYL-DEOXYCYTIDINE-3 1 -OCEP ; 5 1 -MeNPOC-N 4 -t-BUTYL 
15 PHEN0XYACETYL-DE0XYGUAN0SINE-3 '-0CEP; and 5 1 -MeNPOC-N 4 -t-BUTYL 
PHENOXYACETYL-DEOXY ADENOSINE- 3 1 -OCEP.. 

1. Preparation of 4 . 5-methvlenedioxy-2-nitroacetophenone 



A solution of 50 g (0.305 mole) 3 , 4-methylenedioxy- 
acetophenone (Aldrich) in 200 mL glacial acetic acid was added 
dropwise over 30 minutes to 700 mL of cold (2-4 °C) 70% HN0 3 

25 with stirring (NOTE: the reaction will overheat without 

external cooling from an ice bath, which can be dangerous and 
lead to side products) . At temperatures below 0°C, however, 
the reaction can be sluggish. A temperature of 3-5 °C seems to 
be optimal) . The mixture was left stirring for another 60 

30 minutes at 3-5°C, and then allowed to approach ambient. 

temperature. Analysis by TLC (25% EtOAc in hexane) indicated 
complete conversion of the starting material within 1-2 hr. 
When the reaction was complete, the mixture was poured into "3 
liters of crushed ice, and the resulting yellow solid was 
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filtered off, washed with water and then suction-dried. Yield 
"53 g (84%), used without further purification. 

Reparation of 1 - f 4 . S-MethVI gn ^ iovv-?-n i f roDheny1 } 

5 ethanol 



15 



20 



25 





NaBH 4 
EtOH 



Sodium borohydride (lOg; 0.27 mol) was added slowly to a cold, 
10 stirring suspension of 53g (0.25 mol) of 

4,5-methylenedioxy-2-nitroacetophenone in 400 mL methanol. 
The temperature was kept below 10°C by slow addition of the 
NaBH 4 and external cooling with an ice bath. Stirring was 
continued at ambient temperature for another two hours, at 
which time TLC (CH 2 C1 2 ) indicated complete conversion o'f the 
ketone. The mixture was poured into one liter of ice-water 
and the resulting suspension was neutralized with ammonium 
chloride and then extracted three times with 400 mL CH 2 C1 2 or 
EtOAc (the product can be collected by filtration and washed 
at this point, but it is somewhat soluble in water and this 
results in a yield of only "60%) . The combined organic 
extracts were washed with brine, then dried with MgS0 4 and 
evaporated. The crude product was purified from the main 
byproduct by dissolving it in a minimum volume of CH 2 C1 2 or 
THFC175 ml) and then precipitating it by slowly adding hexane 
(1000 ml) while stirring (yield 51g; 80% overall). it can 
also be recrystallized (e.g., toluene-hexane) , but this 
reduces the yield. 
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3. Preparation of 1-M.5- Tnethvlenedioxv-2-nitrophenvl) ethyl 
chlorofnrmate (MeNFOC-Cl) 



N0 2 OH 




Toiuene/THF 




Phosgene (500 aL of 20% w/v in toluene from Fluka: 965 mmole; 
4 eq,) was added slowly to a cold, stirring solution of 50g 
(237 mmole; 1 eq.) of 1- ( 4 , 5-methylenedioxy-2-nitrophenyl) 
ethanol in 4 00 mL dry THF. The solution was stirred overnight 

10 — at amblenL temperature at which point TLC (20% Kt 2 o/hftxane) 

indicated >95% conversion. The mixture was evaporated (an 
oil-less pump with downstream aqueous NaOH trap is recommended 
to remove the excess phosgene) to afford a viscous brown oil. 
Purification was effected by flash chromatography on a short 

15 (9 x 13 cm) column of silica gel eluted with 20% Et 2 0/hexane. 
Typically 55g (85%) of the solid yellow MeNPOC-Cl is obtained 
by this procedure. The crude material has also been 
recrystallized in 2-3 crops from 1:1 ether /hexane . On this 
scale, "100ml is used for the first crop, with a few percent 

20 THF added to aid dissolution, and then cooling overnight at 
-20 °C (this procedure has not been optimized) . The product 
should be stored desiccated at -20°C. 
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l^Synthesis of s>- Mn.pnn ^ o^ual^osi^^ 
. fN ^-diisnpro^ i 2>cy anoPth yl p jiojspj^^ 




Base 



Menpo c Cf 
Pyridine 



MenpocO' 



HO 




Base 



HO 



Base. Thymidine (t , ; n-4-isob.™^ 2 .- DE0XVciTIDINE (lbu 
N-2-PHENOXYACETYL 2 • DEOXYGUAKOSINE (PAC-dG, • and 
N-S-PHENOXYACETYL 2 'DEOXYADENOSINE (PAC-dA) ' 

All four of the 5'-MeKPOC nucleosides were „«„„., . 
base-protected 2 • -deoxynucleosides by "e fo"ow" " ^ 

.he protected 2 • -deo^ucleoside (9 0 ™1 "wTT L^' 
c°-eva P or,ti„g twice with 250 „. anhydrous pyri(Jine ^ 
nucleoside was the„ dissolved in 300 mL anhydrous Pyri^ne ,or 

-ole, „e N Poc-cl in xoo mx, dry THF „,s then added Tith 
stirring over 3 0 minutes. The ice bath was removed , nfl «. 
solution allowed to stir overnight at room temperature 
5-10, MeOH in two diastereomers,. After evaporating 

the solvents under vacuum, the crude material was tlZT 

Hair anTo ™ ="™ ~ " 

"a so 3 f At T' ° r9ani ° PhaSS " aS **« -er 

Ka 2 so 4 _ filtered and evaporated to obtain a yellow ,o„ 

crude products were finaliy purified by .J^ZZZZ' 

. ZonTj: 1 , rr with a steppea 

* neoH m CH 2 C1 2 ) . Yields of the purified diastpro. 
fixtures are m the range of 65-75%. 
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(b. ) 5 1 - Menpoc-2 ' -deoxvnucleoside-3 1 - f N , N-di isopropyl 
2-cvanoethvl phosphoramidites) 




5 

The four deoxynucleosides were phosphitylated using either 2- 
cyanoethyl- N,N- diisopropyl chlorophosphoramidite, or 2- 
cyanoethyl- N,N,N',N'- tetraisopropylphosphorodiamidite. The 
following is a typical procedure. Add ib.og (17,4 ml,- 55 
mmole) of 2- cyanoethyl- N,N,N',N'- tetraisopropylphosphoro- 
diamidite to a solution of 50 mmole 5'- MeNPOC-nucleoside and 
4 . 3g (25 mmole) diisopropylammonium tetrazolide in 250 mL dry 
CH 2 C1 2 under argon at ambient temperature. Continue stirring 
15 for 4-16 hours (reaction monitored by TLC: 45:45:10 

hexane/CH 2 Cl 2 /Et 3 N) . Wash the organic phase with saturated 
aqueous NaHC0 3 and brine, then dry over Na 2 S0 4 , and evaporate 
to dryness* Purify the crude amidite by flash chromatography 
(9 x 25 cm silica gel column eluted with hexane/CH 2 Cl 2 /TEA - 
-YXr -JV;V=*i\£Vi "rxJt ^L, "V; -crt O/fWfru "ror *Uj . "=ine yieib o"t 
purified amidite is about 90%. 



B. PREPARATION OF LABELED DNA /HYBRIDIZATION TO ARRAY 

25 1. PCR 

PCR amplification reactions are typically conducted in a 
mixture composed of, per reaction: 1 pi genomic DNA; 10 pi 
each primer (10 pmol/^l stocks); 10 pi 10 x PCR buffer (100 mM 
Tris.Cl pH8.5, 500 mM KC1, 15 mM MgCl 2 ) ; 10 pi 2 mM dNTPs ■ 

30 (made from 100 mM dNTP stocks); 2*5 U Taq polymerase (Perkin 
Elmer AmpliTaq™, 5 V/pl); and H 2 0 to 100 pi. The cycling 
conditions are usually 40 cycles (94°C 45 sec, 55*C 30 sec, 
72 °C 60 sec) but may need to be varied considerably from 
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sample type to sample type. These conditions are for 0.2 mL 
thin wall tubes in a Perkin Elmer 96 00 thermocycler. See 
Perkin Elmer 1992/93 catalogue for 9600 cycle time 
information. Target, primer length and sequence composition, 
5 among other factors, may also affect parameters. 

For products in the 200 to 1000 bp size range, check 2 M l 
of the reaction on a 1.5% 0.5x TBE agarose gel using an 
appropriate size standard (phiX174 cut with Haelll is 
convenient) . The PCR reaction should yield several picomoles 
l0 of product. It is helpful to include a negative control 

(i.e., 1 M l TE instead of genomic DNA) to check for possible 
contamination. To avoid contamination, keep PC R products from 
previous experiments away from later reactions, using filter 
tips as appropriate. Using a set of working solutions and 
storing master solutions separately is helpful, so long as one 
does not contaminate the master stock solutions. 

For simple amplifications of short fragments from genomic 
DNA it is, in general, unnecessary to optimize Mg 2+ 
concentrations. A good procedure is the following: make a 
master mix minus enzyme; dispense the genomic DNA samples to 
individual tubes or reaction wells; add enzyme to the master 
mix; and mix and dispense the master solution to each well, 
using a new filter tip each time. 

2 . PURIFICATION 

Removal of unincorporated nucleotides and primers from 
PCR samples can be accomplished using the Promega Magic PCR 
Preps DNA purification kit. One can purify the whole sample, 
following the instructions supplied with the kit (proceed from 
section Ilia, -sample preparation for direct purification from 
PCR reactions'). After elution of the PCR product in 50 M i of 
TE or H 2 0, one centrifuges the eluate for 20 sec at 12,000 rpm 
in a microfuge and carefully transfers 4 5 M l to a new 
microfuge tube, avoiding any visible pellet. Resin is 
sometimes carried over during the elution step. This transfer 
prevents accidental contamination of the linear amplification 
reaction with -Magic PCR' resin. other methods, e.g., size 
exclusion chromatography, may also be used. 
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3. Linear amplification 



10 



In a 0.2 mL thin-wall PCR tube mix: 4 M l purified PCR 
product; 2 /il primer (10 pmol/MD ; 4^1 10 x PCR buffer; 4 M l 
dNTPs (2 mM dA, dC, dG , 0.1 mM dT) ; 4 M l 0.1 mM dUTP ; 1 fil 1 
mM fluorescein dUTP (Amersham RPN 2121); 1 U Taq polymerase 
(Perkin Elmer, 5 U/mD? and add H20 to 40 M l. Conduct 40 
cycles (92°C 30 see, 55°C 30 sec, 72°C 90 sec) of PCR. These 
conditions have been used to amplify a 300 nucleotide 
mitochondrial DNA fragment but are applicable to other 
fragments. Even in the absence of a visible product band on 
an agarose gel, there should still be enough product to give 
an easily detectable hybridization signal. If one is not 
treating the DNA with uracil DNA glycosylase (see Section 4) , 
dUTP can be omitted from the reaction. 
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4 . Fragmentation 

Purify the linear amplification product using the Promega 
Magic PCR Preps DNA purification kit, as per Section 2 above. 
In a 0.2 mL thin-wall PCR tube mix: 40 pi purified labeled 
20 DNA; 4 fil 10 x PCR buffer; and 0.5 /xl uracil DNA glycosylase 

(BRL lU/ftl) . Incubate the mixture 15 min at 37°C, then 10 min 
at 9 7 °C; store at -2 0 °C until ready to use. 

«s. Hybridization. S canning & Stripping 

25 a blank scan of the slide in hybridization buffer only is 

helpful to check that the slide is ready for use. The buffer 
is removed from the flow cell and replaced with 1 mL of 
(fragmented) DNA in hybridization buffer and mixed well. The 
scan is performed in the presence of the labeled target. Fig. 

3 0 51 illustrates an illustrative detection system for scanning a 
DNA chip. A series of scans at 3 0 min intervals using a 
hybridization temperature of 2 5°C yields a very clear signal, 
usually in at least 3 0 min to two hours, but it may be 
desirable to hybridize longer, i.e., overnight. Using a laser 

35 power of 50 /xW and 50 §im pixels, one should obtain maximum 
counts in the range of hundreds to low thousands/pixel for a 
new slide. When finished, the slide can be stripped using 50% 
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formamide. rinsing well in deionized H 2 o, blowing dry, and 
storing at room temperature. 

C PREPARATION OF LABFT.FT) PM* / H YBPTnT 7>TTn N Tn appiv 

5 Tagged primers 

The primers used to amplify the target nucleic acid 
should have promoter sequences if one desires to produce RNA 
from the amplified nucleic acid. Suitable promoter sequences 
are shown below and include: 
10 (l) the T3 promoter sequence: 
5 • -CGGAATTAACCCTCACTAAAGG 
5 ' -AATTAACCCTCACTAAAGGGAG; 
(2) the T7 promoter sequence: 
5' TAATACGACTCACTATAGGGAG; 
15 and (3) the SP6 promoter sequence: 
5 . ATTTAGGTGACACTATAGAA . 

The desired promoter sequence is added to the 5- end of the 
PCR primer. It is convenient to add a different promoter to 
20 each primer of a PCR primer pair so that either strand may be 
transcribed from a single PCR product. 

Synthesize PCR primers so as to leave the DMT group on 
DMT-on purification is unnecessary for PCR but appears to be 
important for transcription. Add 25 M l 0 .5M NaOH to 
25 collection vial prior to collection of oligonucleotide to keep 
the DMT group on. Deprotect using standard chemistry - 5 5o C 
overnight is convenient. 

HPLC purification is accomplished by drying down the 
oligonucleotides, resuspending in 1 mL 0.1 M TEAA (dilute 2 0 
30 M stock in deionized water, filter through 0.2 micron filter) 
and filter through 0.2 micron filter. Load 0.5 mL on reverse 
phase HPLC (column can be a Hamilton PRP-i semi-prep, #79426) 
The gradient is 0 -> 50% CH 3 CN over 25 min (program 0.2 
Minol. prep. 0-50, 25 min). Pool the desired fractions, dry down 
35 resuspend in 200 M i 80 % HAc. 30 min RT. Add 200 M l EtOH ; dry' 
down. Resuspend in 200 M i h 2 0, plus 20 „i NaAc P H5.5, 600 M i 
EtOH. Leave 10 min on ice; centrifuge 12,000 rpm for 10 min 
in microfuge. Pour off supernatant. Rinse pellet with 1 mL 
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EtOH, dry, resuspend in 200 pi H20. Dry, resuspend in 200 pi 
TE. Measure A260, prepare a 10 pmol//xl solution in TE {10 mM 
Tris.Cl pH 8.0, 0.1 mM EDTA) . Following HPLC purification of 
a 42 mer, a yield in the vicinity of 15 nmol from a 0.2 pmol 
5 scale synthesis is typical. 

2. Genomic PNA Prep aration 

Add 500 pi (10 mM Tris.Cl pH8.0, 10 mM EDTA, 100 mM 
NaCl, 2% (w/v) SDS, 40 mM DTT, filter sterilized) to the 

10 sample. Add 1.25 pi 20 mg/ml proteinase K (Boehringer) 
Incubate at 55 °C for 2 hours, vortexing once or twice. 
Perform 2x 0.5 mL 1:1 phenol :CHC1 3 extractions. After each 
extraction, centrifuge 12,000 rpm 5 min in a microfuge and 
recover 0.4 mL supernatant. Add 3 5 pi NaAc pH5 . 2 plus 1 mL 

15 EtOH. Place sample on ice 4 5 min; then centrifuge 12,000 rpm 
3 0 min, rinse, air dry 3 0 min, and resuspend in 100 pi TE. 

3 . PCR 

PGR is performed in a mixture containing, per reaction: 
20 1 pi genomic DNA; 4 pi each primer (10 pmol//zl stocks); 4 pi 
10 x PCR buffer (100 mM Tris.Cl pH8 . 5 , 500 mM KC1, 15 mM 
MgCl 2 ); 4 pi 2 mM dNTPs (made from 100 mM dNTP stocks); 1 U 
Taq polymerase (Perkin Elmer, 5 V/pl) ; H 2 0 to 40 pi. About 40 
cycles (94°C 30 sec, 55°C 30 sec, 72°C 30 sec) are performed, 
25 but cycling conditions may need to be varied. These conditions 
are for 0.2 mL thin wall tubes in Perkin Elmer 9600. For 
products in the 200 to 1000 bp size range, check 2 pi of the 
reaction on a 1.5% O.SxTBE agarose gel using an appropriate 
size standard. For larger or smaller volumes (20 - 100 ^1) , 
3 0 one can use the same amount of genomic DNA but adjust the 
other ingredients accordingly. 

4. In vitro transcription 

Mix: 3 pi PCR product; 4 ^1 5x buffer; 2 pi DTT; 2.4 pi 
35 10 mM rNTPs (100 mM solutions from Pharmacia); 0.48 pi 10 mM 
f luorescein-UTP (Fluorescein-12-UTP, 10 mM solution, from 
Boehringer Mannheim); 0.5 pi RNA polymerase (Promega T3 or T7 
RNA polymerase); and add H 2 0 to 20 pi. Incubate at 37°C for 3 
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h. Check 2 M l of the reaction on a 1.5% 0.5xTBE agarose gel 
using a size standard. 5x buffer is 200 mM Tris pH 7.5, 30 mM 
MgCl 2 , 10 mM spermidine, 50 mM NaCl, and 100 mM DTT (supplied 
with enzyme) . The PCR product needs no purification and can 
be added directly to the transcription mixture, a 20 /il 
reaction is suggested for an initial test experiment and 
hybridization; a 100 pi reaction is considered "preparative" 
scale (the reaction can be scaled up to obtain more target) . 
The amount of PCR product to add is variable; typically a PCR 
reaction will yield several picomoles of DNA. If the PCR 
reaction does not produce that much target, then one should 
increase the amount of DNA added to the transcription reaction 
(as well as optimize the PCR) . The ratio of f luorescein-UTP 
to UTP suggested above is 1:5, but ratios from 1:3 to 1:10 - 
15 all work well. One can also label with biotin-UTP and detect 
with streptavidin-FITC to obtain similar results as with 
f luorescein-UTP detection. 

For nondenaturing agarose gel electrophoresis of rna, 
note that the RNA band will normally migrate somewhat faster 
than the DNA template band, although sometimes the two bands 
will comigrate. The temperature of the gel can effect the 
migration of the RNA band. The RNA produced from in vitro 
transcription is quite stable and can be stored for months (at 
least) at -2 0»C without any evidence of degradation. It can 
be stored in unsterilized 6xSSPE 0.1% triton X-100 at -20"C 
for days (at least) and reused twice (at least) for 
hybridization, without taking any special precautions in 
preparation or during use. RNase contamination should of 
course be avoided. When extracting RNA from cells, it is 
preferable to work very rapidly and to use strongly denaturing 
conditions. Avoid using glassware previously contaminated 
with RNases. Use of new disposable plasticware (not 
necessarily sterilized) is preferred, as new plastic tubes, 
tips, etc., are essentially RNase free. Treatment with DEPC 
35 or autoclaving is typically not necessary. 
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5. Fragmentat ion 

Heat transcription mixture at 94 degrees for forty min. 
The extent of fragmentation is controlled by varying Mg 2+ 
concentration (30 mM is typical), temperature, and duration of 

5 heating. 

6. Hybridization, Scanning, & Stripping 

A blank scan of the slide in hybridization buffer only is 
helpful to check that the slide is ready for use. The buffer 
is removed from the flow cell and replaced with 1 mL of 

10 (hydrolysed) RNA in hybridization buffer and mixed well. 

Incubate for 15 - 30 min at 18°C. Remove the hybridization 
solution, which can be saved for subsequent experiments. 
Rinse the flow cell 4-5 times with fresh changes of 6 x SSPE 
/ 0.1% Triton X-100, equilibrated to 18 °C. The rinses can be 

15 performed rapidly, but it is important to empty the flow cell 
before each new rinse and to mix the liquid in the cell 
thoroughly. A series of scans at 3 0 min intervals using a 
hybridization temperature of 2 5 °C yields a very clear signal, 
usually in at least 30 min to two hours, but it may be 

20 desirable to hybridize longer, i.e., overnight. Using a laser 
power of 50 fjiVl and 50 jim pixels, one should obtain maximum 
counts in the range of hundreds to low thousands/pixel for a 
new slide. When finished, the slide can be stripped using 
warm water. 

25 These conditions are illustrative and assume a probe 

length of "15 nucleotides. The stripping conditions suggested 
are fairly severe, but some signal may remain on the slide if 
the washing is not stringent. Nevertheless, the counts 
remaining after the wash should be very low in comparison to 

30 the signal in presence of target RNA. In some cases, much 
gentler stripping conditions are effective. The lower the 
hybridization temperature and the longer the duration of 
hybridization, the more difficult it is to strip the slide. 
Longer targets may be more difficult to strip than shorter 

3 5 targets. 

7. Amplification of Signal 

A variety of methods can be used to enhance detection of 

labelled targets bound to a probe on the array. In one 
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embodiment, the protein MutS (from E . coli) or equivalent 
proteins such as yeast MSH1 , MSH2 , and MSH3 ; mouse Rep-3 , and 
Streptococcus Hex-A, is used in conjunction with target 
hybridization to detect probe-target complex that contain 
mismatched base pairs. The protein, labeled directly or 
indirectly, can be added to the chip during or after 
hybridization of target nucleic acid, and differentially binds 
to homo- and heteroduplex nucleic acid. A wide variety of 
dyes and other labels can be used for similar purposes. For 
instance, the dye YOYO-1 is known to bind preferentially to 
nucleic acids containing sequences comprising runs of 3 or 
more G residues. 



£j Detecti on of Rppeat Sequences 

In some circumstances, i.e., target nucleic acids with 
repeated sequences or with high G/C content, very long probes 
are sometimes required for optimal detection. m one 
embodiment for detecting specific sequences in a target 
nucleic acid with a DNA chip, repeat sequences are detected as 
follows. The chip comprises probes of length sufficient to 
extend into the repeat region varying distances from each end 
The sample, prior to hybridization, is treated with a labelled 
oligonucleotide that is complementary to a repeat region but 
shorter than the full length of the repeat. The target 
nucleic is labelled with a second, distinct label. After 
hybridization, the chip is scanned for probes that have bound 
both the labelled target and the labelled oligonucleotide 
probe; the presence of such bound probes shows that at least 
two repeat sequences are present. 

While the foregoing invention has been described in some 
detail for purposes of clarity and understanding, it will be 
clear to one skilled in the art from a reading of this 
disclosure that various changes in form and detail can be made 
without departing from the true scope of the invention. All 
publications and patent documents cited in this application 
are incorporated by reference in their entirety for all 
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purposes to the same extent as if each individual publication 
or patent document were so individually denoted. 
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WHAT IS CLAIMED IS: 
General tiling claims 

1 l. An array of oligonucleotide probes immobilized on a 

2 solid support, the array comprising at least two sets of 

3 oligonucleotide probes, 

4 (l) a first probe set comprising a plurality of 

5 probes, each probe comprising a segment of at least three 

6 nucleotides exactly complementary to a subsequence of the 

7 reference sequence, the segment including at least one 

8 interrogation position complementary to a corresponding 

9 nucleotide in the reference sequence, 

10 (2) a second probe set comprising a corresponding 

11 probe for each probe in the first probe set, the corresponding 

12 probe in the second probe set being identical to a sequence 

13 comprising the corresponding probe from the first probe set or' 

14 a subsequence of at least three nucleotides thereof that 

15 includes the at least one interrogation position, except that 

16 the at least one interrogation position is occupied by a 

17 different nucleotide in each of the two corresponding probes 

18 from the first and second probe sets; 

19 wherein the probes in the first probe set have at least 

2 0 two interrogation positions respectively corresponding to each 

21 of two contiguous nucleotides in the reference sequence. 

1 2 - An array of oligonucleotide probes immobilized on a 

2 solid support, the array comprising at least four sets of 

3 oligonucleotide probes, 

4 (1) a first probe set comprising a plurality of 

5 probes, each probe comprising a segment of at least three 

6 nucleotides exactly complementary to a subsequence of the 

7 reference sequence, the segment including at least one 

8 interrogation position complementary to a corresponding 

9 nucleotide in the reference sequence, 

10 (2) second, third and fourth probe sets, each 

11 comprising a corresponding probe for each probe in the first 

12 probe set, the probes in the second, third and fourth probe 

13 sets being identical to a sequence comprising the 

14 corresponding probe from the first probe set or a subsequence 
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Of at least three nucleoids* thereof that include, the at 
least one interrogation position, except th.t the at least one 

L e e a r c7o a "t°heT iti0n """^ * 3 • - 

in^each .f the four corresponding, probes ,™ th . tour probe 



15 
16 
17 
18 

19 sets. 



* 3. The oligonucleotide array of claim 2 , further 

3 P ~be 

« the fifth probe set bein p g ide ;;:; a r;/~:: ng probe 

5 co B pr lslng the corresponding profae from « 

6 a subsequence of at least three nucleotides thereof thlt 

7 includes the at least one interrogation position 

- the at least one interrogation position L deleLTthe^^ 

S corresponding probe fro, the fifth probe set. 



1 4. 
2 



The oligonucleotide array of claim 2 , further 

r . the sixth probe set bein, identtca^oT ~ ^ ^ 

* or at least three nucleotides thereof + 

-eludes the at least one interrogation position e " t th t 
an additional nucleotide is inserted adjacent to the at let! 
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5. The array of claim 2, wherein the 
•t .east three interaction position" respect^ ^ 



6 



at , . ° f Claim 2 ' WhSrein the probe set has 

at least 50 interrogation positions respectively L- 
to each of so contiguous nucleotides J. ri^^^ 

7. The array of claim 1 or 2 , wherein the f irst Drobe 
set has at least 100 interrogation positions respect" 
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3 corresponding to each of 100 contiguous nucleotides in a 

4 reference sequence . 

! 8. The oligonucleotide array of claim 1 or 2 , wherein 

2 the first probe set has an interrogation position 

3 corresponding to each of at least 30% of the nucleotides in a 

4 reference sequence and the reference sequence comprises at 

5 least 100 nucleotides. 



l 



9. The oligonucleotide array of claim 8, wherein the 

2 first probe set comprises probes which completely span the 

3 reference sequence, which probes relative to the reference 

4 sequence, overlap one another in sequence. 



The o ligonucleoti d e array o f claim 9, wherein the 



2 first probe set has an interrogation position corresponding to 

3 each of the nucleotides in the reference sequence. 

1 11. The oligonucleotide array of claim 10, wherein the 

2 probes are oligodeoxyribonucleotides . 

1 12. The oligonucleotide array of claim 1 or 2 , wherein 

2 the array comprises between 100 and 10,000 probes. 

1 13. The oligonucleotide array of claim 1 or 2 , wherein 

2 the array comprises between 10,000 and 100,000 probes. 

1 14. The oligonucleotide array of claim 1 or 2 , wherein 

2 the array comprises between 100,000 and 10,000,000 probes. 

1 15. The oligonucleotide array of claim 1 or 2 , wherein 

2 the probes are linked to the support via a spacer. 



1 16. The oligonucleotide array of claim 1 or 2 , wherein 

2 the segment in each probe of the first probe set that is 

3 exactly complementary to the subsequence of the reference 

4 sequence is 9-21 nucleotides. 
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1 17. The oligonucleotide array of claim 16, wherein the 

2 segment is n nucleotides long, and the subsequence is at least 

3 n-2 nucleotides long. 

1 18. The oligonucleotide array of claim 1 or 2, wherein 

2 each probe of the first probe set consists of the segment that 



is exactly complementary to the subsequence of the reference 
sequence. 



1 19. The oligonucleotide array of claim 1 or 2, wherein 

2 the probes in the second, third and fourth probe sets are 
identical to the corresponding probe from the first probe set 
except that the at least one interrogation position is 
occupied by a different nucleotide in each of the four 
corresponding probes from the four probe sets. 



1 20. The array of claim 2, further comprising fifth, 

2 sixth and seventh probe sets, wherein: 

the segment of each probe in the first set 

includes at least two interrogation positions each 
corresponding to a nucleotide in the reference sequence, 

the second, third and fourth probe sets, each 
comprise a corresponding probe for each probe in the first 
probe set, the corresponding probes in the second, third and 
fourth probe sets being identical to a sequence comprising the 
corresponding probe from the first probe set or a subsequence 
of at least three nucleotides thereof that includes a first 
interrogation position except that the first interrogation 
position is occupied by a different nucleotide in each of the 
14 four corresponding probes from the four probe sets; 

the fifth, sixth and seventh probe sets, each 
comprising- a corresponding probe for each probe in the first 
probe set, the probes in the fifth, sixth and seventh probe 
sets being identical to a sequence comprising the 
corresponding probe from the first probe set or a subsequence 
of at least three nucleotides thereof that includes a second 
interrogation position, except that the second interrogation 
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22 position is occupied by a different nucleotide in each of the 
2 3 four corresponding probes from the four probe sets. 

1 21. The array of claim 2, wherein each probe in the 

2 first probe set further comprises a second segment of at least 

3 three nucleotides exactly complementary to a second 

4 subsequence of the reference sequence, and the probes from the 

5 second, third and fourth probe sets comprise the corresponding 

6 probe from the first probe set or a subsequence thereof 

7 comprising the first and second segments except in the at 

8 least one interrogation position. 

1 22. The array of claim 2, further comprising: 

2 a fifth probe set comprising at least one probe 

3 comrri^"q » moment of at least seven nucleotides exactly 

4 complementary to a subsequence of the reference sequence 
except at one or two positions, the segment including at least 
one interrogation position corresponding to a nucleotide in 
the reference sequence not at the one or two positions; 

sixth, seventh and eighth probe sets, each comprising a 

9 probe for each probe in the fifth probe set, the corresponding 

10 probes from the sixth, seventh & eighth probe sets being 

11 identical to a sequence comprising the corresponding probe 
from the fifth probe set or a subsequence of at least nine 



12 

13 nucleotides thereof including the at least one interrogation 

14 position and the one or two positions, except in the at least 

15 one interrogation position, which is occupied by a different 

16 nucleotide in each of the four probes. 

1 23. The array of claim 2, wherein the probes are 

2 arranged on the substrate so that the first set of probes is 

3 arranged in a row across the substrate in an order reflecting 

4 the overlap between the probes and the reference sequence, and 

5 the additional sets of probes are arranged in columns relative 

6 to the probes in said first set, so that probes with the same 

7 interrogation position are in the same column and so that each 
"8 column comprises dr. 'reers^. % -prr&o&x . 
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1 24. The array of Claim 2, wherein said probes are 12 to 

2 17 nucleotides in length. 

1 25. The array of claim 2, wherein said probes are is 

2 nucleotides in length and attached by a covalent linkage to a 



3 site on a 3 - -end of said probes, and said interrogation 

4 position is located at position 7, relative to the 3- -end of 

5 said probes. 

1 26. The array of claim 2, further comprises fifth, 

2 sixth, seventh and eighth probe sets, 
CD a fifth probe set comprising a plurality of 

probes, each probe comprising a segment of at least three 
nucleotides exactly complementary to a subsequence of a second 
reference sequence, the segment including at least one 
interrogation position complementary to a corresponding 
nucleotide in the reference sequence, 

* (2) the sixth, seventh, and eighth probe sets, each 

comprising a corresponding probe for each probe in the fifth 
probe set, the probes in the sixth, seventh and eighth probe 
sets 'oemg Vaeritica'i -co a sequence comprising tne 
corresponding probe from the fifth probe set or a subsequence 
of at least three nucleotides thereof that includes the at 
least one interrogation position, except that the at least one 
interrogation position is occupied by a different nucleotide 
m each of the four corresponding probes from the fifth 
sixth, seventh and eighth probe sets. 

27. The array of claim 22, wherein the f irst , second 
third and fourth probe sets have probes of a first length and 
the fifth, sixth, seventh and eight probe sets have probes of 
a second length different from the first length. 

Tiling for vildtype and mutant reference sequences 

28. An array of oligonucleotide probes immobilized on a 
solid support, the array comprising at least one pair of first 
and second probe groups, each group comprising a first and 
second sets of oligonucleotide probes as defined by claim l - 
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5 wherein each probe in the first probe set from the 

6 first group is exactly complementary to a subsequence of a 

7 first reference sequence and each probe in the first probe set 

8 from the second group is exactly complementary to a 

9 subsequence from a second reference sequence. 

1 29. The array of claim 28, wherein the second reference 

2 sequence is a mutated form of the first reference sequence. 

1 30. The array of claim 28, wherein each group further 

2 comprises third and fourth probe sets, each comprising a 

3 corresponding probe for each probe in the first probe set, the 

4 probes in the second, third and fourth probe sets being 

5 identical to a sequence comprising the corresponding probe 

e from th^ fjT-g*-- prnbe set or a subsequence of at least three 

7 nucleotides thereof that includes the interrogation position, 

8 except that the interrogation position is occupied by a 

9 different nucleotide in each of the four corresponding probes 
10 from the four probe sets. 

1 31. The array of claim 30 that comprises at least five 

2 pairs of first and second probe groups, wherein the probes in 

3 the first probe sets from the first groups of the five pairs 

4 are exactly complementary to subsequences from five different 

5 respective first reference sequences. 

1 32. The array of claim 30 that comprises at_ Least, fjxrJ-s^ 

2 pairs of first and second probe groups, wherein the probes in 

3 the first probe sets from the first groups of the forty pairs 

4 are exactly complementary to subsequences from forty 

5 respective first reference sequences. 



Block tiling 

1 33. An array of oligonucleotide probes immobilized on a 

2 solid support, the array comprising at least a group of probes 

3 comprising: 

4 a wildtype probe comprising a segment of at least three 

5 nucleotides exactly complementary to a subsequence of a 
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6 reference sequence, the segment having at least first and 

7 second interrogation positions corresponding to first and 

8 second nucleotides in the reference sequence, 

9 a first set of three mutant probes, each identical to a 

10 sequence comprising the wildtype probe or a subsequence of at 

11 least three nucleotides thereof including the first and second 

12 interrogation positions, except in the first interrogation 

13 position, which is occupied by a different nucleotide in each 

14 of the three mutant probes and the wildtype probe; 

15 a second set of three mutant probes, each identical to a 

16 sequence comprising the wildtype probe or a subsequence of at 

17 least three nucleotides thereof including the first and second 

18 interrogation positions, except in the second interrogation 

19 position, which is occupied by a different nucleotide in each 
2 0 of the three mutant probes and the wildtype probe. 
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34. The array of claim 33, wherein the segment of the 
wildtype probe comprises 3-20 interrogation positions 
corresponding to 3-2 0 respective nucleotides i n the reference 
sequence, and the array comprises 3-20 respective sets of 
three mutant probes, each of the three probes identical to a 
sequence comprising the wildtype probe or a subsequence 
thereof including the 3-20 interrogation positions, except 
that one of the 3-20 interrogation positions is occupied by a 
different nucleotide in each of the three mutant probes and 
the wildtype probes, the one of the 3-20 interrogation 
11 positions being different in each of the 3-20 respective sets 
10 of three mutant probes . 



12 
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35. An array of probes immobilized to a solid support 
comprising two groups of probes, each group as defined by 
claim 33, a first group comprising a wildtype probe comprising 
a segment exactly complementary to a subsequence of a first 
reference sequence and a second group comprising a wildtype 
probe comprising a segment exactly complementary to a 
subsequence of a second reference sequence. 
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1 36. The array of claim 35, comprising at least 10-100 

2 groups of probes, each comprising a wildtype probe comprising 

3 a segment exactly complementary to a subsequence of at least 

4 10-100 respective reference sequences . 

Pooled probes 

1 37. A method of comparing a target sequence with a 

2 reference sequence, the method comprising: 

3 identifying variants of a reference sequence differing 

4 from the reference sequence in at least one nucleotide; 

5 assigning each variant a designation, 

6 providing an array of pools of probes, each pool 

7 occupying a separate cell of the array, wherein each pool 

8 comprises a probe comprising a segment exactly complementary 
_9 to each variant sequencp assigned a particular designation, 

10 contacting the array with a target sequence comprising a 

11 variant of the reference sequence; 

12 determining the relative hybridization intensities of the 

13 pools in the array to the target sequence; 

14 determining the target sequence from the relative 

15 hybridization intensities of the pools. 

1 38. The method of claim 37, wherein the variants are 

2 assigned numbers according to an error code. 

1 39. The method of claim 37, wherein each variant is 

2 assigned a designation having at least one digit and at least 

3 one value for the digit, and each pool comprise a probe 

4 comprising a segment exactly complementary to each variant 

5 sequence assigned a particular value in a particular digit. 

1 40. The method of claim 39, wherein the variants are 

2 assigned successive numbers in a numbering system of base m 

3 having n digits, and the array comprises n x (m-1) pools of 

4 probes . 
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1 41. 



2 
3 



41. The method of claim 40, wherein 
==»prises a probe comprising . ^nTZZ ^ 

to the reference sequence. exactly complementary 



Trellis tiling 
1 42. 



8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 



42. a pooled probe comprising a 

at a first interrogation positio „ occ ^™ a ™ 
nucleotide N, a second interrogation potion 
Pooled nucleotide seiected from th e g ^p * " 

°£ (!) « or K, < 2) R or y aM *"°" P ° f three consisting 

' interrogation position occupieiL.se T ' ^ 
• selected from the group, wherein th. T P °° lea "»«•<*«• 
» occupying the second interrogation pos^i" " Uel ~»" 
nucleotide complementary to fcorresT « °° mPlis *° a 
the reference sequence L " ^""P""^"* nucleotide from 

reference sequent ar e I^" 0 ^*^ f"" " 
nucleotide occupying the third Inte™^ P °° 1 "' 
comprises a nucleotide ^pl^tl^ tZS^ 0 " 
nucleotide from the refer,.™ ""^responding 
Prohe and the reference seance a""" **» "» ™" 
-erein M is A, c. o or T(U) T K is G or ^Vi"^ 
« - or G . v is c or , m . W is A or s" £ G * ~ * * 



1 43 
2 



3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 



43. An array of oligonucleotide probes w r 
solid support, the arrav ™ • . P ro *> es immobilized on 

t-t- me array comprising: 

first, second and third cell* > 
first, second and third poo^^s ^ 
comprising a segment exactly l^"^ 0 ^ ^ 
a reference sequence except at , " t0 " sute *°-"=e of 

occupied b y a pooled nucleotide , "^"^ 
position occupied hy , pooled nu= eotide "7^"°" 
croup of three consisting of (1) „ or *' "^f^ fr °» 
or w, and a third interrogation position o * S 

pooled nucleotide selected from Z. ^ 3 SeCOnd 

nucleotide occupying the second ZllZll • ""^ ^ P °°^ 
c-prises a nucleotide ^ le , en ™ * 

nucleotide fro. th. reference sequent JLTZ^^ 
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and the reference sequence are maximally aligned, and the 
pooled nucleotide occupying the third interrogation position 
comprises a nucleotide complementary to a corresponding 
nucleotide from the reference sequence when the pooled probe 
and the reference sequence are maximally aligned; 

provided that one of the three interrogation 
positions in the each of the three pooled probes is aligned 
with the same corresponding nucleotide in the reference 
sequence, this interrogation position being occupied by an N 
in one of the pooled probes, and a different pooled nucleotide 
in each of the other two pooled probes, 



! 44 The array of claim 4 3 further comprising: 

2 fourth and fifth cells respectively occupied by fourth 

3 and fifth pooled probes, each pooled probe as defined by 

4 claim 43, 

5 wherein one of the three interrogation position in the 

6 second, third and fourth pooled probes is aligned with the 

7 same corresponding nucleotide in the reference sequence, this 

8 interrogation position being occupied by an N in one of the 

9 pooled probes, and a different pooled nucleotide in each of 

10 the other two pooled probes, 

11 wherein one of the three interrogation position m the 

12 third, fourth and fifth pooled probes is aligned with the same 

13 corresponding nucleotide in the reference sequence, this 

14 interrogation position being occupied by an N in one of the 

15 pooled probes, and a different pooled nucleotide in each of 

16 the other two pooled probes. 

! 45. The array of claim 44, wherein the pooled probes are 

2 identical except at the interrogation positions. 

1 46. The array of claim 44, wherein the first, second, 

2 third, fourth and fifth pooled probes are exactly 

3 complementary to five respective subsequences of the reference 



8 
9 

10 



3 
4 
5 
6 
7 
8 
9 

10 

11 
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4 sequences that from each other by increment- * 

5 nucleotide. increments of one 

Bridge tiling 

a fxrst probe comprising first anri «,« „ ±JJroDes * 
« of -t least three nucleotides and elct ly *r°"f Se9 * entS ' ««* 
» ««* ana second subsequences of a ^"LcTs """^ ^ 
6 segments including at „ . r " erence sequences, the 

> ~„, to : nt^L i 1 : positi ° n 

wherein either (i) the first anri "^rence sequence, 

noncontiguous, r ( t h ' \ ■ Ubse ^««« «• 

control an/t J LTt an ; S s "I™ -hsequences are 
n tirst and second secrments «t-^ ■ 

11 relative tn fh a ~~ & «ymenx:s are inverted 

e ro the complement of the fir^ a ^ 

12 subsequences in the reference nuance 

" =a=o„d. third and fourth probes, identical to 

14 ^"Prising the first probe or a sub, Mentlcal to » sequence 

» « least three nudeotides fro, each of T T"" 

" segments, except in th . ,t l„l and "•<=<»><» 

» differs L each ofte P r 0 be°s n r " t *"°""« — 

« =-«;:e„ceTar\ r ^:e"t 4 :;^r::: rv irst - ~ 

3 reference sequence. nucleotides in the 

^ Two interrogation positions (no wii atype) 

2 so lid 4 s 9 UPP :: t ar t r h a e y :L o v li9onuc ieot±de pr ° bes on . 

nucleotides that is L^f a se ^t of at least 7 

es tnat is exactly complementary to a o„Hc 
from a reference sequence, except that tL SUbSe * Uenc * 



I 

I 
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in first and second probes, the segment is exactly 

13 complementary to the subsequence, except at not more than one 

14 of the interrogation positions, and 

15 in third and fourth probes, the segment is exactly 

16 complementary to the subsequence, except at both of the 

17 interrogation positions - 

1 50. An array of probes immobilized to a support, the 

2 array comprising at least 100 sets of 4 probes, each set as 

3 defined by claim 49, the probes from the at least 100 sets 

4 comprising at least 100 respective segments, the segments 

5 having at least 100 respective first and second interrogation 

6 positions. 



Hel pe r mutati o n s 



1 51. An array of oligonucleotide probes immobilized on a 

2 solid support, the array comprising a set of probes 

3 comp_risinq_: 

4 a first probe comprising a segment of at least 7 

5 nucleotides exactly complementary to a subsequence of a 

6 reference sequence except at one or two positions, the segment 

7 including an interrogation position not at the one or two 

8 positions; 

9 second, third and fourth mutant probes, each identical to 

10 a sequence comprising the wildtype probe or a subsequence 

11 thereof including the interrogation position and the one or 

12 two positions, except in the interrogation position, which is 

13 occupied by a different nucleotide in each of the four probes. 



1 



Omission of Perfectly Matched Probe 

52. An array of oligonucleotide probes immobilized on a 

2 solid support, the array comprising at least two sets of 

3 oligonucleotide probes, 

4 (1) a first probe set comprising a plurality of 

5 probes, each probe comprising a segment exactly complementary 

6 to a subsequence of at least 3 nucleotides of a reference 

7 sequence except at an interrogation position, 
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I or-oh- , <2) . * SeC °" d Pr ° be *" ^ ris ^ ' corresponding, 



19 
20 



. * correspond- 

probe m the second probe set being identical to a sequence' 
comprising the corresponding probe from the first probe set or 
a subsequence of at least three nucleotides thereof that 
includes the interrogation position, except that the 

interrogation position is occupied by a different nucleotide 
m each of tho fr.ro ^^^.^.^ 



10 
11 
12 
13 
14 

, c . , * --r - ^^^j-c-l^ul. nucleotide 

15 in each of the two corresponding probes and the con.ple.ent to 

16 the reference sequence, 
17 
18 



wherein the probes in the first probe set have at 
least three interrogation positions respectively corresponding 
to each of three contiguous nucleotides in the reference 
sequence. 



Methods 

1 53. 



2 
3 
4 
5 
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A method of comparing a target nucleic acid with a 
reference sequence comprising a predetermined sequence of 
nucleotides, the method comprising: 

(a) hybridizing the target nucleic acid to an array 
of oligonucleotide probes immobilized on a solid support, the 



6 array comprising: 
7 
8 
9 
10 



(1) a first probe set comprising a plurality of 
probes, each probe comprising a segment of at least three 
nucleotides exactly complementary to a subsequence of the 
reference sequence, the segment including at least one 
interrogation position complementary to a corresponding 
12 nucleotide in the reference sequence, 

(2) a second probe set comprising a corresponding 
probe for each probe in the f irst probe set, the corresponding 
probe m the second probe set being identical to a sequence 
comprising the corresponding probe from the first probe set or 
a subsequence of at least three nucleotides thereof that 
includes the at least one interrogation position, except that 
the at least one interrogation position is occupied by a 

21 fr^rtr, 11110 ^ 0 ^^ " ° f tW ° ™P-^ng probes 

^i from the first and second probe sets; 

wherein, the probes in the first probe set have at 
least three interrogation positions respectively corresponding 
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to each of at least three nucleotides in the reference 

2 5 sequence, and 

26 (b) determining which probes, relative to one 

27 another, in the array bind specifically to the target nucleic 

28 acid, the relative specific binding of the probes indicating 

29 whether the target sequence is the same or different from the 

3 0 reference sequence. 

1 54. The method of claim 53, wherein the array further 

2 comprises third and fourth probe sets, each comprising a 

3 corresponding probe for each probe in the first probe set, the 

4 probes in the second, third and fourth probe sets being 

5 identical to a sequence comprising the corresponding probe 

6 from the first probe set or a subsequence of at least three 

7 n^i^Hries thereof that includes the at least one 

8 interrogation position, except that the at least one 

9 interrogation position is occupied by a different nucleotide 

10 in each of the four corresponding probes from the four probe 

11 sets. 

1 55. The method of claim 54, wherein the target sequence 

2 has a substituted nucleotide relative to the reference 

3 sequence in at least one undetermined position, and the 

4 relative specific binding of the probes indicates the location 

5 of the position and the nucleotide occupying the position in 

6 the target sequence. 

1 56. The method of claim 54, wherein: 

2 the hybridizing step comprises hybridizing the 

3 target nucleic acid and a second target nucleic acid to the 

4 array; and 

5 .the determining step comprises determining which 

6 probes, relative to one another, in the array bind 

7 specifically to the target nucleic acid or the second target 

8 nucleic acid, the relative specific binding of the probes 

9 indicating whether the target sequence is the same or 

10 different from, the reference sequence and whether the second 
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11 target sequence is the same or different from the reference 

12 sequence. 



1 
2 



57. The method of claim 56, wherein the target sequence 
has a label and the second target sequence has a second label 



3 different from the label, 



1 
2 
3 



58. The method of claim 56, wherein undetermined first 
and second proportions of the first and second target 
sequences are hybridized to the array and the specific binding 



4 indicates the proportions, 



1 59. The method of claim 54, further comprising: 

2 (c) removing the target nucleic acid from the array; 

3 (d) hybridizing a second target nucleic acid to the' 

4 array; 

5 (e) determining which probes, relative to one another in 
the array bind specifically to the second target nucleic acid 
the relative specific binding of the probes indicating whether 
the second target sequence is the same or different from the 

9 reference sequence. 



1 60. A method of comparing a target nucleic acid with a 

2 reference sequence comprising a predetermined sequence of 

3 nucleotides, the method comprising: 

4 hybridizing the target sequence to the array of 

5 claim 28; 

6 determining which probes in the first group, 

7 relative to one another, hybridize to the target sequence the 

8 relative specific binding of the probes indicating whether the 

9 target sequence is the same or different from the first 

10 reference sequence; 

11 determining which probes in the second group, 
relative to one another, hybridize to the target sequence, the 
relative specific binding of the probes indicating whether the 
target sequence is the same or different from the second 

15 reference sequence. 



12 
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1 61. The method of claim 60, wherein the hybridizing step 

2 comprising hybridizing the target sequence and a second target 

3 sequence to the array, and the relative specific binding of 

4 the probes from the first group indicates that the target is 

5 identical to the first reference sequence, and the relative 

6 specific binding of the probes from the second group indicates 

7 that the second target sequence is identical to the second 

8 reference sequence . 

1 62. The method of claim 61, wherein the first and second 

2 target sequences are heterozygous alleles of a gene. 

Comparative hybridization 

1 63. A method of comparing a target nucleic acid with a 
—2 reference sequence comprising a prpdpf.p.rmi ned sequence of 

3 nucleotides, the method comprising: 

4 (a) hybridizing the reference sequence to an array 

5 of oligonucleotide probes immobilized on a solid support, the 

6 array comprising; 

7 (l) a first probe set comprising a plurality of 

8 probes, each probe comprising a segment of at least 3 

9 nucleotides exactly complementary to a subsequence of the 

10 reference sequence except in at least one interrogation 

11 position; 

12 (2) a second probe set comprising a corresponding 

13 probe for each probe in the first probe set, the corresponding 

14 probe in the second probe set being identical to a sequence 

15 comprising the corresponding probe from the first probe set or 

16 a subsequence of at least three nucleotides thereof that 

17 includes the at least one interrogation position, except that 

18 the at least one interrogation position is occupied by a 

19 different nucleotide in each of the two corresponding probes 

20 from the first and second probe sets; and 

21 (b) determining which probes, relative to one 

22 another, in the array bind specifically to the reference 
2 3 sequence; 

24 (c) hybridizing a -target sequence to the array; 
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25 (d) det ermining which probes, relative to one 

26 another, in the array bind specifically to the target 

27 sequence; 

wherein the relative specific binding of the probes 



29 to the reference and the target sequence indicates whether the 

30 reference sequence is the same or different from the target 

31 sequence. y 



1 64. The method of claim 63, wherein the reference 

2 sequence has a first label and the second reference sequence 



1 

2 



— , — icierence sequence 
has a second label different from the first label, and steps 
(a) and (c) are performed simultaneously. 



HIV Chip 



is frZ' T ar " y ° f ^ 2 ' WhSrein thS " fe "- -quence 
is from a human immunodeficiency virus. 



1 66. The array of claim 65, wherein the reference 

sequence is from a reverse transcriptase gene of the human 



3 immunodef iciency virus. 

The array of claim 66, wherein the reference 



1 67. 

2 

3 virus. 



' teierence 

sequence is from a protease creno n f w 

virus< P gene of tne hun,a " immunodeficiency 

1 68. The array of claim 66, wherein the reference 

2 sequence is a full-length reverse transcriptase gene. 

69. The array of claim 68 comprising at least 3200 
oligonucleotide probes* 

70. The array of claim 66, wherein the HIV gene is from 
the BRU HIV strain. m 

71. The array of claim 66, wherein the HIV gene is fro™ 
the SF2 HIV strain. 9 fr01n 
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1 72. The array of claim 28, wherein the reference 

2 sequence is from the coding strand of a reverse transcriptase 

3 gene of a human immunodeficiency virus and the second 

4 reference sequence is from the noncoding strand of the reverse 

5 transcriptase gene. 

1 73. The array of claim 28, wherein the first reference 

2 sequence is from a reverse transcriptase gene of a human 

3 immunodeficiency virus and the second reference sequence 

4 comprises a subsequence of the first reference sequence with a 

5 substitution of at least one nucleotide. 

1 74. The array of claim 73, wherein the substitution 

2 confers drug resistance to a human immunodeficiency virus 

-3 com p rising the second refer e nc e s equence . 

1 75. The array of claim 28, wherein the first and second 

2 reference sequences are from a reverse transcriptase gene from 

3 first and second strains of a human immunodeficiency virus. 

1 76. The array of claim 28, wherein the first reference 

2 sequence is from a reverse transcriptase gene of a human 

3 immunodeficiency virus and the second reference sequence is 

4 from a 16S RNA, or DNA encoding the 16S RNA, from a pathogenic 

5 microorganism. 

1 77. The array of claim 28, wherein the first reference 

2 sequence is from a reverse transcriptase gene of a human 

-> 0 -ximrimralx^rrurerrTcy "Viru^ ~arfb ^re -Ksnrarrih "r^rerence sequence is 
4 from a protease gene of the human immunodeficiency virus. 

1 78. The method of claim 54, wherein the reference 

2 sequence is from a human immunodeficiency virus. 



1 

2 
3 



79. The method of claim 78, wherein the reference 
sequence is from a human immunodeficiency virus and the target 
sequence is from a second human immunodeficiency- virus. 
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1 80. The method of claim 79, wherein the target sequence 

2 has a substituted nucleotide relative to the reference 

3 sequence in at least one undetermined position, and the 

4 relative specific binding of the probes indicates the location 

5 of the position and the nucleotide occupying the position in 

6 the target sequence. 

1 81. The method of claim 80, wherein the target sequence 

2 has a substituted nucleotide relative to the reference 
sequence in at least one position, the substitution conferring 

4 drug resistance to the human immunodeficiency virus, and the 

5 relative specific binding of the probes reveals the 

6 substitution. 



3 



9 
10 



1 82. The method of claim 78, wherein: 

2 the hybridizing step comprises hybridizing the 

3 target nucleic acid and a second target nucleic acid, the 

4 second target sequence being from a reverse transcriptase gene 

5 of a third human immunodeficiency virus, to the array; and 

6 the determining step comprises determining which 

7 probes, relative to one another, in the array bind 

8 specifically to the target nucleic acid or the second target 
nucleic acid, the relative specific binding of the probes 
indicating whether the target SBqu^nce. is the same or 

11 different from the reference sequence and whether the second 

12 target sequence is the same or different from the reference 

13 sequence. 

1 83. The method of claim 82, wherein the first target 

2 sequence has a first label and the second target sequence has 

3 a second label different from the first label. 

1 84. The method of claim 82, wherein undetermined first 

2 and second proportions of the first and second target 

3 sequences are hybridized to the array and the specific binding 

4 indicates the proportions. 
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CFTR Chip 

1 85. The array of claim 2, wherein the reference sequence 

2 is from a CFTR gene. 

1 86. The array of claim 85, wherein the reference 

2 sequence is exon 10 of a CFTR gene, and said array comprises 

3 over 1000 oligonucleotide probes, 10 to 18 nucleotides in 

4 length . 



1 




87. The array of claim 85, wherein said array comprises 




2 


a set of probes comprising a specific nucleotide sequence 




3 


selected from the group of sequences comprising: 




4 


3 t -TTTATAXTAG ; 




5 


3 


TTATAGXAGA; 




6 


3 f - 


TATAGTXGAA? 




7 


3 


ATAGTAXAAA; 




8 


3 »- 


TAGTAGXAAC; 




9 


3 


AGTAGAXACC; 




10 


3 


GTAGAAXCCA; 




11 


3 


TAGAAAXCAC; and 




12 


3 1 - 


AGAAACXACA; wherein each set comprises 4 probes, 




13 


and 


X is individually A, G, C, and T for each set. 




1 




88 . The array of claim 85 , wherein said group of 




2 


sequences comprises: 




3 


3 ' - TTT ATAXT AG AAAC C ; 




4 


3 


TTATAGXAGAAACCA ; 




5 


3'- 


TATAGTXGAAACCAC ; 




6 


3 «- 


ATAGTAXAAACCACA ; 




7 


3 ! - 


T AGT AG XAA C C ACAA ; 




8 


3 1 - 


AGTAGAXACCACAAA ; 




9 


3 


GTAGAAXCCACAAAG ; 




10 


3 


TAGAAAXCACAAAGG ; and 




11 


3 


AGAAACXACAAAGGA; wherein each set comprises 4 




12 


probes, and X is individually A, G, C, and T for each set. 




1 




89. The array of claim 32, wherein the forty first 





2 reference sequences are from a CFTR gene. 



WO 95/11995 

PCT/US94/12305 

155 

1 90. The array of claim 89, wherein ^ k * 

2 re£e «„ ce sequenoes incau ; es h a Slte :r the forty 

3 least one adjacent nucleotide. * ° Utation at 

3 nucleotides from a CFTR gene. contiguous 

1 92. 

2 



3 
4 



referlL s^encTis /f^" !!' " - «~* 

£ - st „„d of th i ;™ ;ir nce se9uen ~ is 



1 93. 
2 

3 



93. An array of oligonucleotide Drnh6B • 
solid support, the arrav ! • "unobili 2e d on a 

i^wiu, tne array comprising at leact- =. 
comprising: 9 IeaSt a ^roup of probes 

« a wildtype probe exactly CORiplement 

seguence, waes in the reference 

* a first set of three mutant probes •„ • 

0 wildtype probe, except in a f irst of th V ldentl «l to the 

1 positions, which is occupied by a dL^ ^—^ion 
- the three mutant probL J£T ^ ~* 

a second set of three »„t,„t probes „„„ !' . 
«- Wildtype probe, except in a second ^ ^tT 1 " 1 <° 
interrogation positions, „hich is occupied bv J ,, 
nucleotide in each of the three .utant and t h 
probe; proDes and the wildtype 

•8 a third set of three mutant probes each <„ 

^ v.ldtype probe, except in a third of IL f identic ^ to the 



7 
8 
9 
10 

11 

12 

13 

14 

15 

16 
7 

.8 
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nucleate in each of th. «hr- P«>«« — """t^ 



25 

2 6 probe 



27 
28 
29 

30 O 



1 fifth set of three mutant probes, each identical to the 
wildtype probe, except in a fifth of the five 

positions which is occupied by a different nucleot.de in each 
f the three mutant probes and the wildtype probe. 



94. The array of claim 93 comprising first and second 

* rnbes each group as defined by claim 93, the first 
2 groups of probes, each gro p 



3 
4 
5 

6 s 

7 



group wildtype probe exactly complementary to a 

aroup comprising a wiiatype P 1U •< • • „ 

first reference sequence, and the second group comprising a 
wildtype probe exactly complementary to a second reference 
sequence, wherein the second reference sequence is a mutated 
form of the first reference sequence. 



95 The array of claim 94, wherein the first reference 
sequence is from a CFTR gene and the second reference sequence 



1 
2 



3 is 



a mutated form of the first reference sequence. 



96 The method of claim 56, wherein the target sequence 

2 and the" second target sequence are from heterozygous alleles 

3 of a CFTR gene. 



1 



P " °97 P The array of claim 2, wherein the reference sequence 

2 is a sequence from a p53 gene. 

1 98. The array of claim 2, wherein the reference sequence 

2 is from an hMLHl gene. 

! 99. The array of claim 2, wherein the reference sequence 

2 is from an MSH2 gene. 

, 100 The array of claim 28, wherein the reference 

2 sequence is from a human P53 gene and the second reference 

3 sequence is from an hMLHl gene. 

X ... 101. The array of claim 100, further comprising: 



8 
9 
10 
11 
Tz 
13 
14 
15 

16 

17 

18 
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» nucleotides exactly « „ at least » 

• France seguenc th / ^ to a subseguence ot . 
7 "terrogation position co J"2 at least «d 

ide in the t Mrd °X™'» *> « 

< 2 > the tenth 7 s ^en=e. 

Prone sets -^ lng - lbeHt: . lctfi the tenth, eleventh and twelfth 

corresponding probe trm " »iu«=. co nprli 

3i ng the 

or at least three nucleotides th^oTt^ ^ ~ 3 
least one interrogation position "dudes the at 

Rogation position J oc cup Ted ^ <-* the at least one 

tenth T ' OUr "* rent "-^otid. 

tenth, ei8venth ^ no prob e s fro „ ^ 



has at least 60 interrogation islt ^ >™be «* 

CWl5UMS nucleotides frra eJt o„ 6 tl0nS «"••»««». to at so 

=^ue„ 1 =e' ia Th ^"7j f ;! s 7 98 ' " h »" 1 " the reference 
™«eotides long, and th e "j^' P«bes are „ 

c°»Ple*entary to the „ f ° f is exactly 

interrogation position J at pos^T^ *"» « one 

°f *ach prohe. which 3. -end if °" rei «ive to a 3.-!!! 
substrate. ls "valently attached to the 

Sitocoondriai chip 

::r ... ,.,„.,„ 

10 5. The array of ri^ 

-~ - a se„ o/rti 1 ::; r u :;r saia — — 
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106 . The array of cl.i- 10. . —in o-loop reoion i. 
f ull-l®**gth. 

»rrav ot claim 104, wherein said reference 
107 - s at least SO* of a £ »ll-len g th B i t o=hon*rial 
sequence is at least 

genome . 

o-f claim 104, wherein the reference 
10 8. The array of claim 
sequence is bounded by positions 16280 
mitochondrial genome. 
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Single base-pair mismatch 
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